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Chapter  1 
Introduction 


Cooperative  control  systems  are  increasingly  emerging  as  significant  alternatives  to  their 
centralized  counterparts  recently.  The  rising  interest  in  deploying  cooperative  systems  is  fu¬ 
eled  by  the  development  of  decentralized  systems  with  low  cost  and  performance  advantages. 
For  example,  mobile  exploration  and  information  gathering  tasks  can  often  be  accomplished 


Figure  1.1:  A  swarm  of  robots  are  expected  to  explore  unknown  planets. 

cheaply  and  more  reliably  by  swarms  of  small  autonomous  robots  as  opposed  to  a  single 
more  sophisticated  one.  Cooperative  control  is  also  applied  in  many  tasks  that  can  not 
be  performed  by  a  single  system,  e.g.  satellite  arrays  that  enable  global  communication, 
geographically  remote  systems  that  communicate  via  network  and  others. 

The  goal  of  our  research  is  to  investigate  optimal  control  in  cooperative  systems,  using 
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algorithms  inspired  from  biology.  We  begin  with  a  review  of  collective  behavior  in  biological 
systems. 

1.1  Cooperative  Biological  Systems 

Animal  aggregation  is  a  common  phenomenon  in  nature,  seen  in  organisms  that  range  in 
complexity  from  primal  zooplanktons  to  advanced  mammals.  Many  species  exhibit  collective 
movement  patterns  which  are  highly  organized,  compared  to  the  seemingly  random  individual 
behaviors.  For  example,  a  school  of  hsh  can  move  together  in  a  tight  formation  and  respond 
almost  as  fast  as  a  single  organism  to  evade  encountering  dangers.  Worker  honey  bees  can 
distribute  themselves  to  different  nectar  sources  in  accordance  with  the  prohtability  of  each 
source.  Ants  can  recruit  their  nest-mates  to  form  a  trail  along  the  most  efficient  path  between 
the  nest  and  food  when  foraging  [1,  2]. 

The  above  examples  show  that  aggregate  behaviors  in  these  animals  may  have  special 
group-level  properties  that  go  beyond  the  ability  of  an  individual.  Certainly,  if  all  group 
members’  behaviors  are  coordinated  by  a  centralized  “leader” ,  the  leader  must  have  the  ca¬ 
pabilities  to  communicate  with  others  and  alter  their  behaviors.  Observing  the  qualitatively 
identical  behaviors  of  all  members  in  an  insect  aggregate  as  well  as  their  physical  limitations, 
we  can  conclude  that  there  are  no  such  leaders  in  these  groups  (and  this  is  supported  by 
other  research  [1,  2,  4]).  Therefore,  some  of  the  awe-inspiring  group  behaviors  in  nature 
come  about  as  the  results  of  individuals’  self-organized  actions.  For  instance,  at  the  individ¬ 
ual  level  honey  bees  receive  limited  information  from  other  workmates  and  go  to  forage  the 
selected  flowers.  This  type  of  behavior  seems  to  lead  to  random  distribution  over  different 
sources  because  the  message  each  bee  obtained  does  not  convey  to  it  accurate  information 
about  the  prohtability  of  each  nectar  source.  At  the  group  level,  however,  it  is  amazing 
to  see  that  foragers  are  rationally  dispatched  over  different  howers  in  accordance  with  the 
distribution  of  nectar  over  various  sources.  Coordinating  a  colony  of  bees  to  achieve  such  a 
complicated  collective  behavior  seems  very  difhcult  for  any  individual  bee.  A  reasonable  ex¬ 
planation  is  that  bees  only  follow  some  simple  rules  all  through  the  foraging  activities  while 
the  collective  behavior  turns  out  to  be  highly  organized  [1].  In  conclusion,  the  individual 
behavior  is  an  “unsophisticated”  one  due  to  the  individuals’  physical  limitations,  in  contrast 
to  the  complex  performance  of  the  whole  group.  This  fact  implies  that  there  seems  to  ex¬ 
ist  an  intrinsic  mechanism  among  insect  aggregates  that  overcomes  individuals’  drawbacks 
and  yields  results  that  might  be  impossible  for  individuals  to  attain.  It  is  the  cooperation 
between  group  members,  i.e.  the  rule  that  each  individual  complies  with,  that  yields  group 
patterns^  qualitatively  different  from  and  more  elegant  than  those  of  individual  behaviors. 

^We  use  “group  pattern”  to  refer  to  the  collective  movement  pattern  of  a  group. 
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We  have  seen  that  biological  systems,  especially  social  insects,  demonstrate  many  promis¬ 
ing  cooperative  solutions  to  complicated  tasks.  Many  of  these  tasks  are  similar  (at  least 
functionally)  to  what  one  might  want  to  do  with  cooperative  engineered  collectives.  In  ad¬ 
dition,  individual  members  in  a  biological  collective  are  similar  to  the  units  of  a  cooperative 
control  system  in  the  sense  that  they  are  equipped  with  limited  capabilities  of  sensing,  com¬ 
municating  and  computing.  What  we  are  essentially  interested  in  is  trading  off  individual 
capability  of  cooperating  in  order  to  achieve  a  complex  task  with  less  sophisticated  equip¬ 
ment:  low  power,  short  sensing  range  and  low  communication  burden,  looking  to  natural 
examples  -  like  that  ants  are  able  to  hnd  the  most  efficient  path  while  individual  ants  are 
of  short  sight  and  low  intellect  -  for  successful  prototypes.  Natural  systems  have  developed 
such  capabilities  to  solve  various  problems  through  evolution  and  natural  selection,  and  may 
offer  us  some  clues  on  how  to  proceed  [1,  26]. 

1.2  Research  Objectives 

The  objective  of  this  work  is  to  investigate  the  cooperative  solution  of  a  class  of  optimal 
control  problems  using  groups  of  agents^  with  limited  sensing  and  computing  capabilities. 
Our  approach  will  be  to  postulate  rules  for  individual  behavior,  inspired  from  observations 
of  biological  systems,  and  then  investigate  the  “group  pattern”  that  emerges.  Rules  for 
individual  agents  will  be  obtained  by: 

1.  Constructing  a  proper  model  for  the  observed  collective  movement  patterns  of  certain 
biological  systems,  including  ant  colonies.  An  effective  model  will  allow  us  to  capture 
some  aspects  of  the  “experience”  accumulated  through  natural  selection. 

2.  Extracting  simple  “rules”  that  capture  individual  behavior  within  the  group.  These 
rules  should  be  kept  simple,  with  respect  to  the  computation  and  communication 
resources  required  to  implement  them,  to  be  applied  to  cooperative  control  systems, 
such  as  cheap  autonomous  robots. 

3.  Exploring  how  these  rules  can  be  applied  to  artihcial  collectives  in  order  to  solve 
optimization  control  problems  that  are  hard  or  impossible  for  an  individual  to  solve. 
This  will  involve  combing  existing  methods  on  optimal  control  with  the  specihed  rules. 

^Throughout  the  document  we  will  use  “agent”  to  refer  to  a  member  of  a  group  of  control  systems. 
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1.3  Outline 


The  rest  of  this  paper  is  organized  as  follows:  in  Chapter  2  we  will  review  various  recent 
research  directions  of  cooperative  systems.  A  class  of  algorithms  for  cooperative  optimal 
control  inspired  from  the  observed  movements  of  ant  colonies  will  be  introduced  in  Chapter 
3,  along  with  a  discussion  of  the  algorithms’  potential  advantages.  Chapter  4  presents  some 
current  progress  concerning  the  proposed  algorithms,  including  convergence  analysis,  special 
cases  and  numerical  experiments.  Finally,  the  ongoing  work  and  an  outline  of  possible 
approaches  for  its  completion  are  given  in  Chapter  5. 
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Chapter  2 

Literature  Review 


The  potential  of  a  cooperating  group  to  “do  better  than  the  sum  of  its  parts”  has  already 
seeded  a  variety  of  recent  research  directions  in  engineering,  from  modeling  of  animal  groups 
[1,  24,  25,  26],  to  distributed  collective  covering  and  searching  [28,  29],  estimating  by  groups 
[30,  31,  32],  cooperative  robotic  teams  [33,  34,  42]  and  biologically-motivated  optimization 
[36,  27].  These  works  typically  treat  narrowly  dehned  problems  [36,  32,  27],  discuss  only 
the  feasibility  of  special  tasks  [33,  34,  42],  or  show  the  effectiveness  instead  of  optimality 
of  various  proposed  algorithms  [28,  29].  Here,  we  review  some  of  these  and  other  relevant 
works. 


2.1  Animal  Group  Pattern  Modeling 

The  work  of  [24]  proposed  a  simple  model  concerning  the  movement  of  n  autonomous  agents 
with  the  same  speed  but  with  varying  headings.  If  each  agent  of  a  group  uses  the  “nearest 
neighbor  rule”  to  update  its  heading,  that  is 

«.(*)+  Y. 

iGAlp)  / 

where  9i{t)  is  the  heading  of  the  agent  and  ni{t)  is  the  number  of  neighbors  of  the 
agent  at  time  t,  then  all  agents’  headings  will  converge  to  a  common  constant  vector  as  time 
goes  on.  The  theoretical  explanation  for  the  convergence  described  in  the  above  model  is 
provided  in  [25] ,  along  with  several  similar  models  inspired  by  [24] ,  such  as  “leader  following” 
showing  that  if  there  exists  an  agent  acting  as  the  “leader”  in  the  group,  all  agents  will  evolve 
to  point  to  the  same  heading  as  the  leader  .  This  “nearest  neighbor  rule”  can  cause  all  the 
members  of  a  group  to  move  towards  the  same  direction  despite  the  fact  that  there  is  no 
centralized  coordination  and  that  an  agent’s  set  of  nearest  neighbors  might  change  as  the 


mt))r  = 


1  +ni{t) 
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system  evolves.  The  models  developed  by  [24,  25]  have  been  used  to  explain  how  a  group  of 
birds  or  hsh  manage  to  move  in  tight  formation  as  a  single  entity. 


Figure  2.1:  The  flow  diagram  illustrates  the  model  of  how  honey  bees  allocate  the  foragers. 

Another  mathematical  model  is  constructed  in  [1]  to  describe  the  foraging  activities  of 
worker  honey  bees.  Each  honey  bee  complies  with  certain  rules  to  determine  where  it  will 
go  to  forage.  This  process  is  described  by  a  flow  diagram  illustrated  in  Fig.  2.1.  At  the 
bifurcations  on  the  diagram,  honey  bees  make  decisions  on  which  nectar  source  to  forage 
and  whether  to  dance  -  the  way  honey  bees  transfer  information  -  or  not.  The  decision¬ 
making  process  is  modeled  as  the  probabilities  of  proceeding  various  actions.  For  example, 
Px  represents  the  probability  for  one  bee  to  watch  other  dancers  after  it  unloads  the  nectar 
collected  from  flower  A,  P^(l  —  P^)  represents  the  probability  of  dancing  for  the  flower  A  and 
Pp  represents  the  probability  of  following  other  dancers  to  forage  flower  A.  Noticing  that 
honey  bees  make  decisions  only  after  receiving  limited  information  from  their  workmates,  [1] 
proposed  a  set  of  simple  equations  to  describe  these  probabilities,  e.g. 

pA  ^  PAdA 

^  PAdA  +  Psds 

where  Da  represent  the  number  of  dancers  for  flower  A  and  dA  is  the  proportion  of  time  that 
foragers  actually  dance.  Other  probabilities  such  as  P^  can  be  assumed  to  be  a  constant. 
Simulations  showed  a  collective  result  that  was  qualitatively  similar  to  what  is  observed  in 
real  bee  colonies. 
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2.2  Models  of  Ant-Trail  Formation 


One  of  the  awe-inspiring  phenomena  in  nature  is  the  foraging  activity  of  ant  colonies,  which 
includes  discovering  foods,  recruiting  nest-mates  and  forming  trails.  When  an  ant  finds 
food,  it  will  recruit  other  ants  around  to  convey  food  back  to  the  nest.  These  co-workers  will 
rapidly  form  a  well-dehned  trail  between  the  nest  and  food  although  they  are  homogeneously 
distributed  at  hrst.  Finding  an  efficient  line  between  the  nest  and  food  seems  too  complicated 
a  problem  for  an  individual  ant  to  solve,  especially  if  one  considers  the  ant’s  tiny  size  relatively 
to  obstacles  in  the  environment,  such  as  stones,  stick  and  crevices.  Nonetheless,  a  colony  of 
ants  seem  to  always  be  able  to  complete  this  task  [1].  To  explore  the  intrinsic  mechanism 
that  leads  to  the  collective  efficiency  as  opposed  to  individual  clumsiness,  several  models 
concerning  ant-trail  formation  have  been  proposed. 

The  work  of  [1]  described  a  model  about  how  ants  utilize  pheromonal  secretions  to 
choose  ongoing  pathways.  According  to  this  model,  pheromonal  secretions  are  laid  along  the 
paths  by  ants  to  keep  a  trace  and  recruit  other  nest-mates.  At  the  same  time,  pheromonal 
secretions  evaporate  as  time  goes  on.  When  an  ant  comes  to  a  location  where  several  traces 
cross,  it  will  try  to  follow  the  path  with  the  highest  concentration.  As  illustrated  in  Fig.  2.2, 


Figure  2.2:  An  ant  chooses  the  path  in  accordance  with  pheromone  concentrations 
the  probability  of  taking  the  left  branch  of  a  “fork”  in  the  terrain  is  quantihed  as 

{k  +  ClT 

^  (/c  +  ClY  +  {k  +  CrY 

The  parameters  Cl  and  Cr  represent  the  pheromone  concentrations  on  the  left  and  right 
branch,  n  and  k  are  constants  corresponding  to  the  degree  of  nonlinearity  of  the  choice  and 
the  attraction  of  an  unmarked  branch,  respectively.  The  key  point  is  that  the  pheromonal 
secretions  play  a  “positive  feedback”  role.  Although  an  individual  ant  knows  little  about  the 
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entire  environment  and  the  distribntion  of  its  co-workers,  simnlations  show  that  the  colony 
has  the  collective  potential  to  hnd  the  shortest  path. 

Another  model  concerning  ant-trail  formation  on  a  plane  was  explored  in  [26].  The 
basic  rnle  in  this  model  is  that  each  ant  “follows”  one  of  its  co-workers  instead  of  measnring 
pheromonal  secretions,  as  Fig.  2.3  illnstrates.  In  [26],  the  pheromonal  secretions  laid  by  an 
ant  are  nsed  to  trace  its  own  tail  and  hnd  its  way  back  to  the  nest  bnt  not  to  recrnit  its 
nest-mates.  Paraphrasing  [26],  the  path  traveled  by  a  single  ant  is  a  cnrve  Xkit)  :  [0,  T]  — 


Fignre  2.3:  Ants  hnd  the  shortest  path  joining  two  members 

with  Xk  =  u{xk)  {u  G  M^).  The  bonndary  conditions  for  these  systems  are  Xo(0)  =  Xq  and 
Xo(T)  =  Xf,  which  represent  the  starting  point  (nest)  and  the  target  point  (food)  respectively. 
Any  ant  can  trace  its  own  trajectory  back  to  the  nest  so  that  we  have  a  seqnence  of  ants 
departing  from  xq.  Each  ant  moves  with  nnit  speed  and  there  are  A  nnits  of  time  between 
the  departnre  time  of  snccessive  ants.  At  every  instance,  each  ant  except  the  hrst  one  will 
follow  its  predecessor  by  pointing  its  speed  vector  in  a  straight  line  toward  the  predecessor. 
In  short,  for  k  =  1,2,3  .. . 

■  (f\  -  Xk-l{t)  -  Xk{t) 

~  \\xk-iit)  -  Xkim 

with  Xkill)  =  Xq  for  t  <  kA  and  XkiT)  =  Xf  ii  Xk  reaches  the  target  Xf.  For  the  case  when 
Xk{t)  G  it  has  been  shown  that  if  the  initial  ant  XQ{t)  has  access  to  a  snb-optimal  path 
from  Xq  to  Xf,  then  the  trajectories  {xk}  will  converge  to  a  straight  line  connecting  xq  and 

Xf. 

2.3  Distributed  Covering  and  Searching 

Inspired  by  the  fact  that  ants  and  other  insects  use  pheromones  for  various  communication 
and  coordination  tasks,  [28]  developed  robust  adaptive  algorithms  to  perform  tasks  requiring 


the  traversal  over  an  unknown  region,  such  as  cleaning  the  floor  of  an  unmapped  building. 
The  region  to  be  covered  is  described  by  a  graph  G  =  {V,  E),  where  every  vertex  represents 
an  “atomic  region”  (tile).  When  agents  deployed  in  the  algorithms  are  traveling  on  G, 
they  mark  the  trails  by  depositing  a  pheromone,  which  evaporates  as  time  goes  on.  By  this 
mechanism,  the  agents  can  assign  each  edge  of  the  graph,  which  represents  the  neighborhood 
relation  between  two  “atoms” ,  with  a  label  of  the  time  that  implies  the  most  recent  traversal 
of  that  edge.  An  agent  visiting  vertex  u  G  V{G)  checks  the  labels  on  all  edges  emanating 
from  u,  thereafter  it  goes  the  direction  that  was  not  visited  for  the  longest  time  by  choosing 
the  smallest  label.  The  time  needed  to  cover  all  edges  of  the  graph  by  k  agents  under  the 
“ANT-WALK-1”  rule  based  on  the  above  idea  is  bounded  as 


where  A  is  the  maximum  vertex  degree  in  G,  n  =  |1/(G)|,  a  is  related  to  the  measurement 
noise  and  p{G)  is  the  cut-resistant  of  G.  In  the  same  work,  the  “ANT-WALK-2”  rule,  a 
generalization  of  the  famous  Depth-First  search  algorithm,  was  developed  for  agents  with 
limited  amounts  of  memory.  The  time  for  this  rule  is  bounded  as 


where  the  notation  is  as  before. 


The  work  of  [29]  investigated  the  performance  of  cooperative  strategies  that  control 


autonomous  air  vehicles  searching  a  dynamic  environment  to  gather  information.  The  pro¬ 
posed  framework  considers  two  main  components  for  each  agent:  distributed  learning  of  the 
environment  and  distributed  path  planning  based  on  the  information  gathered.  The  collec¬ 
tive  results  based  on  a  recursive  g-step  ahead  as  well  as  an  interleaved  planning  technique 
illustrate  that  the  cooperation  among  vehicles  improves  the  performance.  The  authors  also 
explored  the  feasibility  of  developing  coordination  control  strategies  inspired  by  the  social 
foraging  activities  of  E.  coli,  a  common  type  of  bacteria. 


2.4  Distributed  Localization  and  Estimation 

The  study  of  [30,  31]  proposed  a  method  called  “Cooperative  Positioning  System  (CPS)” 


to  aleviate  the  weakness  of  traditional  position  identification  techniques  usually  applied  in 


robotics,  including  dead  reckoning  and  landmark.  In  that  work,  a  robot  group  is  divided  into 
two  teams  in  order  to  provide  “portable  landmarks”.  At  every  instance,  one  team  moves 
and  the  other  stays  static,  acting  as  the  landmark,  then  they  exchange  roles.  Therefore, 
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each  team  can  benefit  from  accurate  measurement  by  utilizing  static  landmarks,  while  at 
the  same  time,  no  prior  placing  of  landmarks  is  required.  The  drawback  is  that  at  least  one 
robot  must  stay  stationary  so  that  the  overall  speed  of  the  algorithm  is  restricted. 

Another  approach  has  been  presented  in  [32]  to  simultaneously  localize  a  group  of  mobile 
robots  with  respect  to  the  others’  positions.  Each  robot  measures  its  own  motion  using  its 
proprioceptive  sensors.  When  two  robots  Xi,Xj  meet,  they  will  share  information  with  one 
another,  then  the  robot  will  update  the  estimate  of  its  own  position  with  respect  to 
the  robot’s  and  the  relative  distance  estimate  between  the  two  robots.  The  proceeding 
propagation  and  update  are  described  by  the  Kalman  filter  equations  in  [32].  This  method 
distributes  what  would  be  a  centralized  estimation  process  among  M  Kalman  hlters,  each 
of  them  operating  on  a  different  robot. 

2.5  Group  Formations 

The  work  of  [33]  derived  a  framework  that  allows  robots  equipped  with  range  sensors  to  con¬ 
trol  their  states  in  order  to  accomplish  the  searching  or  rescuing  manipulations.  The  authors 
derived  three  formation  controls  -  “Separation-Bearing  Control”,  “Separation-Separation 
Control”  and  “Separation  Distance- To-Obstacle  Control”  -  with  respect  to  neighboring 
robots  or  obstacles  in  the  environment.  The  “basic  formation”  framework  is  constructed 
using  the  above  formation  controls  and  is  proved  to  be  able  to  stabilize  the  formation  of  a 
robot  team.  Lastly,  that  work  outlined  a  coordination  strategy  allowing  switches  between 
control  policies  for  maintaining  the  formation  in  situations  with  constraints  on  the  sensors, 
actuators  and  the  environment. 

A  smooth  time- varying  feedback  control  law  is  developed  in  [34]  to  organize  formations 
of  multiple  nonholonomic  wheeled  mobile  robots.  Each  robot  senses  the  relative  positions 
of  its  neighboring  robots  in  its  own  coordinate  system  Sj.  The  formation  control  is  described 
by  a  vector  called  “formation  vector”.  Because  it  is  hard  to  obtain  asymptotically  stable 
performance  for  robots  with  nonholonomic  constraints  via  smooth  static-state  feedback  con¬ 
trols,  the  authors  utilized  a  time-varying  feedback  control  law  to  get  the  desired  velocity 
for  each  agent.  Using  an  analytical  method  based  on  averaging  theory,  the  group  formation 
under  this  mechanism  is  proved  to  be  asymptotically  stable. 

Another  coordinate  strategy  for  vehicle  group  maneuvers,  including  translation,  rota¬ 
tion,  expansion  and  contraction,  is  presented  in  [42]  through  the  construction  of  artihcial 
potentials  and  virtual  leaders.  The  control  applied  on  each  vehicle  is  dehned  as  the  linear 
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combination  of  the  gradient  of  these  potentials  as  well  as  a  linear  damping  term: 

N  M 

^  ^  ^ ^  ^  ^ Xi^hi,hik} 
j^i  k^i 

where  Ui  =  Xi  is  the  control,  Xij  is  the  distance  between  the  vehicle  and  the  vehicle  and 
hik  is  the  distance  between  the  vehicle  and  the  virtual  leader  k.  The  artihcial  potentials 
Vi  deploy  attraction  to  distant  neighbors  as  well  as  repulsion  for  neighbors  too  close.  The 
accomplishment  of  desired  mission  is  through  controlling  the  direction  of  virtual  leaders’ 
motion,  while  the  speed  of  the  virtual  leaders  is  to  ensure  the  convergence  of  the  formation. 
The  convergence  property  is  proved  by  Lyapunov’s  method. 


2.6  Biologically-Motivated  Optimization 


The  work  of  [36]  introduced  a  search  methodology  based  on  the  “distributed  autocatalytic 
process”  to  solve  a  classical  optimization  problem  -  the  Traveling  Salesman  Problem  (TSP). 
Inspired  from  the  fact  that  ants  can  use  pheromonal  secretions  to  hnd  the  shortest  path 
when  foraging,  [36]  utilized  an  ant  team  to  travel  through  the  towns  in  TSP.  The  transition 
probability  from  town  i  to  town  j  for  the  ant  is  dehned  as 


E 

0 


if  j  e  Vtk 
otherwise 


(2.1) 


where  is  the  set  of  towns  reachable  by  k  and  Tij{t)  is  the  intensity  of  pheromonal  trail 
on  edge  (i,  j)  at  time  t,  which  is  laid  by  ants  on  the  edge  and  evaporates  as  time  goes  on. 
The  visibility  of  the  path,  r]ij,  is  dehned  as  the  reciprocal  of  the  distance  between  the  town 
i  and  town  j,  dij,  i.e.  rjij  =  1/dij.  Lastly,  a  and  f3  are  parameters  evaluating  the  relative 
importance  of  the  trail  and  the  visibility,  respectively.  Based  on  Eq.  (2.1),  [36]  developed 
three  algorithms:  “ant-cycle” , “ant  density”  and  “ant-quantity”,  each  based  on  a  slightly 
different  rules  by  which  ants  update  the  Tij{t)  along  their  trails.  The  trajectories  of  the  ant 
team  in  each  algorithm  eventually  converges  to  the  optimal  tour  for  the  TSP. 


The  “probabilistic  pursuit”  algorithm  for  a  group  of  agents  moving  on  a  planar  grid  was 
presented  in  [27].  Briehy,  a  sequence  of  agents  Aq,  ^i, . . .  are  moving  from  the  origin  at  time 
t  =  0,  A,  2A, ...  to  a  destination.  While  moving  on  the  grid,  An+i  “chases”  An  by  making 
a  random  choice  of  a  neighboring  grid  point  and  moving  there.  The  probability  distribution 
that  dehnes  the  agent’s  choice  is  determined  by  its  relative  position  to  its  predecessor,  that 
is 


An+l{t  -|-  1)  —  An-\-l{t)  +  5n+l{t  +  1) 
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where  Sn+i(t  +  1)  G  {1,  -1,  J, -j}  and 


Prob{5„+i(t  +  1)  =  sign(4)}  = 
Prob{(5n+i(t  +  1)  =  j  •  sign{dy)}  = 


114 


d 

\\dy\ 


d 


where  An{t)  is  the  position  of  the  agent  at  time  t,  d  =  ||4||  +  II4II  dx,dy  are 
relative  distances  between  and  at  the  x  and  y  directions,  respectively.  Analytical 
and  simulations  show  that  the  average  trajectories  of  agents  converge  to  a  straight  line  on 
the  plane.  This  work  is  related  to  the  problem  of  discovering  optimal  trajectories  that  will 
be  the  focus  of  this  research.  It  is  of  course  restricted  to  a  discretized  plane. 
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Chapter  3 

Biologically  Inspired  Algorithms  for 
Optimal  Control 


In  this  chapter  we  introduce  a  class  of  algorithms  inspired  by  ant-trail  formation  and  discuss 
their  potential  advantages.  Recall  that  there  are  already  several  effective  models  of  ant-trail 
formation  [1,  26],  which  explained  how  a  colony  of  ants  hnd  the  shortest  path  length  between 
two  points  and  already  seeded  some  applications  [36,  28].  We  are  particularly  interested  in 
the  simplicity  of  the  model  in  [26].  However,  [26]  only  applies  to  a  very  narrow  domain 
with  holonomic,  kinematic  vehicles).  We  would  like  to  expand  it  to  a  much  broader 
class  of  optimization  problems,  including  many  classical  problems  in  optimal  control.  Before 
proceeding  with  the  algorithms,  we  describe  the  precise  problems  we  are  concerned  with. 

3.1  Problem  Statement  and  Notation 

For  our  purposes,  the  agents  are  assumed  to  be  a  number  of  “copies”  of  a  dynamical  system, 
i.e.  for  /c  =  0, 1,  2  . . . 

Xk  =  f{xk,Uk)  e  e  H  C  M”"  (3.1) 

Physically,  each  copy  of  Eq.  (3.1)  could  stand  for  a  robot,  UAV  or  other  autonomous  system. 

What  we  discuss  here  are  some  classical  trajectory  optimization  problems  for  systems 
evolving  under  Eq.  (3.1)  with  hxed  end  points.  Each  function  Xk{t)  :  [0,T]  — M”  represents 
a  trajectory  dehned  by  the  agent’s  movement.  For  simplicity,  let  us  start  with  the  problem 
with  hxed  hnal  time. 
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Fixed  Final  Time  Problems 


Assume  the  starting  state  xq  and  target  state  Xf  are  equilibrium  points  of  Eq.  (3.1)  for 
u  =  0^,  i.e. 


Xk(t)  =  f(xk(t),  0)  =  0  if  Xk(t)  e  {xo,Xf} 

The  problem  we  are  concerned  with  is  Ending  a  trajectory  x*(t)  that  minimizes  the  cost 
function 

rto+T 

J{x,x,to,T)=  g{x{t),x{t),t)dt  (3.2) 

J  to 

with  x{to)  =  Xq,  x{tQ  +  T)  =  Xf  and  subject  to  i  =  f{x,  u). 

The  cost  function  could  apply  in  various  categories  of  optimal  control  problems,  e.g. 
g{x(t),x(t),t)  =  ||i;||  (length  minimization). 

Let  D  C  M”  be  a  domain  containing  states  a  and  b.  Assume  0  <  a  <  T  and  to  >  0.  The 
optimal  trajectory  from  a  to  &  in  fixed  T  units  of  time  is  defined  to  be  x*(t)  (t  G  [to,  to  +  ^]) 
satisfying: 


J{x*,x*,tQ,T)  =  mmJ{x,x,to,T)  subject  to  a;(to)  =  a,  x(to  +  T)  =  6  (3.3) 

X 

For  notational  convenience,  we  define  the  cost  of  following  x*{t)  for  a  units  of  time  as: 


f*tQ-\-cr 


r]{a,b,T,to,a)  =  /  g{x*{t),  x*{t),t)dt 


a<T 


(3.4) 


'^0 


where  the  optimal  trajectory  x*{t)  is  defined  in  Eq.  (3.3). 
For  a  generic  trajectory  x{t),  we  define 


C(x,to,cr)  = 


r'io+o' 


(3.5) 


'io 


to  be  the  cost  incurred  along  x(t)  during  [to,  to  +  d). 


Free  Final  Time  Problems 

Consider  a  class  of  optimal  control  problems  with  free  final  time  (such  as  minimum-time 
control),  where  we  are  trying  to  find  a  trajectory  x*(t)  and  a  best  final  time  T  >  0  that 
minimize  the  cost  function 

rto+r 

JF(x,x,to)  =  /  g{x{t),  x{t),t)dt  (3.6) 

Jto 

^Otherwise  we  can  assume  there  exist  uq  and  Uf  such  that  f{xo,uo)  =  f{xf,Uf)  =  0. 
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with  the  restriction  that  x{to)  =  xo,x{to  +  F)  =  Xf  and  F  >  0.  The  cost  of  the  optimal 
trajectory  x*{t)  {t  G  [to,^o  +  T])  from  a  to  6  is  dehned  as: 

JF{x*,x*,to)  =  min  Jf{x,x,  to)  with  a;(fo)  =  a,  +  T)  =  ^  over  all  F  >  0  (3.7) 

x,r 

The  cost  of  following  the  optimal  trajectory  for  a  nnits  of  time  is  dehned  as 

pto+cr 

riF{a^b^toiCy)  =  /  g{x*{t),x*{t),t)dt  ^<F  (3.8) 

Jto 

where  x*{t)  is  dehned  in  Eq.  (3.7). 

3.2  A  Class  of  Bio-Inspired  Pursuit  Algorithms 

In  the  model  of  ant-trail  formation  described  in  [26],  each  ant  is  trying  to  catch  np  its 


Fignre  3.1:  A  geodesic  discovery  process  on  a  plane. 

predecessor  on  in  the  most  “efficient”  way,  namely  by  pointing  its  velocity  vector  towards 
its  predecessor.  The  trajectories  generated  by  the  movements  of  ants  are  gradnally  optimized 
and  the  trajectory  seqnence  converges  to  a  straight  line  on  M^,  as  illnstrated  in  Fig.  3.1.  The 
work  in  [18]  expanded  the  above  approach  to  nneven  terrains.  Both  [18]  and  [26]  separated 
the  task  of  hnding  a  geodesic  over  long  distances  into  many  simpler  tasks  of  seeking  geodesics 
connecting  nearby  points.  The  difhcnlty  of  “following”  increases  in  accordance  with  the 
distance  between  the  predecessor  and  the  snccessor  and  with  the  complexity  of  the  terrain. 
It  is  easer  for  an  ant  to  aim  at  its  leader  on  and  move  on  a  shortest  path  toward  it  if  they 
are  closer,  whether  on  a  plane  or  on  a  terrain.  Same  is  for  a  robot  that  havs  limited  sensing 
range  and  compnting  ability. 
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Figure  3.2:  It  is  easier  to  solve  an  optimization  problem  within  a  “small”  region. 


Figure  3.3:  Expanding  the  algorithm  to  more  general  optimization  problems  on  a  manifold. 

We  are  interested  in  generalizing  the  existing  approaches  in  [26,  18]  to  a  much  broader 
class  of  optimization  problems  in  Eq.  (3.3),  (3.6),  using  an  iterative  strategy  requiring  little 
communication  as  well  as  short-range  sensing.  There  is  an  analogy  here  between  optimal 
control  problems  that  are  easier  to  solve  where  the  boundary  conditions  are  “close”  to  one 
another,  and  members  of  a  collective  that  are  easier  to  follow  from  a  close  distance,  as  Fig. 
3.2  illustrates.  Our  idea  is  to  seek  optimal  trajectories  locally,  by  means  of  “/oca/  pursuit", 
and  combine  the  efforts  of  a  group  of  agents  to  gradually  optimize  an  initial  solution. 

Our  approach  will  be  to  propose  a  set  of  iterating  rules  that  somehow  generalize  the 
idea  of  pursuit  to  settings  with  non-trivial  geometry,  and  agents  with  non-trivial  dynamics. 
If  this  approach  succeeds,  then  complicated  tasks  could  be  separated  into  simpler  tasks  and 
accomplished  by  a  group  of  “inexpensive”  agents.  The  following  is  an  algorithm  that  pre¬ 
scribes  the  evolving  of  a  group,  given  an  initial  feasible  trajectory. 

Algorithm  1  (Sampled  Local  Pursuit):  Identify  two  states  xq  and  Xf  on  B.  Let 
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t=tk 


t=tk+S 


t=tk+25 


Q  Follower  X|< 
O  Leader  X|(.i 


Xk(t)  - Xk(t) 

Xk-l(t) 


Figure  3.4:  A  snapshot  of  the  updating  processes  executed  by  the  /c*"  agent. 


Xo{t)  {t  G  [0,T])  be  an  initial  trajectory  satisfying  Eg.  (3.1)  with  a;o(0)  =  Xq,Xq{T)  =  Xf. 
Choose  the  following  interval  A  and  updating  interval  6  such  that  0  <  6  <  A  <  T .  Then 
follow  the  next  rules  for  the  agent. 


1.  For  k  =  1,  2,  3  . . .,  let  the  tk 
0,  Xk(t)  =  xq  for  0  <t  <tk. 


kA  be  the  starting  time  of  the  k^^  agent.  Let  Uk{t)  = 


2. 


When  t  =  tk  +  iS,i  =  0, 1,  2,  3, . . calculate  uI{t)  such  that  f{xk{T),ul{T))  =  x\{t), 
where 


xI{t)  achieves 


r]{xkit),Xk-i{t),  A,t,  A),  re[t,t  +  A]  ifA  +  i5<T 

r]{xkit),Xf,tk  +  T  —  t,t,tk  +  T  —  t),  TE[t,tk  +  T]  otherwise 


3.  Apply  Uk{t)  =  u'l^+^^it—tk—iS)  to  the  k^^  agent  for  t  &  [tk+iS,tk  +  {i+^)d)  if  A+id  <T 
or  t  E  \tk  +  id,  tk  +  T)  otherwise. 


Repeat  from  step  2,  until  the  k^^  agent  reaches  Xf. 


This  is  a  “sampled”  version  local  pursuit  because  agents  are  only  required  to  update  their 
trajectories  a  hnite  number  of  times.  There  are  two  adjustable  parameters:  the  “following 
interval”  A  and  the  “updating  interval”  6.  Usually  we  take  0  <  5  <  A.  We  will  refer  to  the 
times  =  tk  +  id, i  =  0, 1,2, 3...  as  the  “updating  times” .  Notice  that  the  SLP  algorithm 
yields  a  well-dehned  trajectory  Xk(t)  on  [0,T],  if  given  Xk-i(t).  The  resulting  trajectory  is 
continuous  but  not  necessarily  smooth  at  the  time  interval  [tk,tk  +  T].  A  snapshot  of  the 
iteratively  updating  processes  is  illustrated  in  Fig.  3.4. 

According  to  the  SLP  algorithm,  agents  leave  the  starting  state  Xq  one  after  another, 
each  in  A  units  of  time  after  its  predecessor.  That  is,  if  the  {k  —  1)*^  agent  leaves  the  starting 
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state  at  time  the  agent  will  leave  it  at  =  tfc-i  +  A.  We  assume  the  number  of 
agents  in  the  group  is  large  and  label  each  agent  by  an  integer  k  so  that  we  can  utilize  Xk{t) 
to  denote  the  k^^  agent’s  trajectory^.  Each  agent  moves  to  pursue  its  predecessor.  If  we 
denote  the  {k  —  1)*^  agent  as  the  “leader”  during  this  pursuit  relationship,  the  agent  will 
be  denoted  as  the  “follower”.  At  each  t  =  t\,  the  follower  calculates  the  optimal  control 
uI{t)  (r  G  [t,t  +  A))  that  steers  it  from  Xk(t)  to  Xk-iif)  over  A  units  of  time,  i.e.  from  its 
current  state  to  the  leader’s  current  state.  Then  during  \tk  +  iS,tk  +  (f  +  1)5],  the  follower 
moves  along  the  trajectory  driven  by  and  the  process  repeats  until  the  follower  reaches 

Xf. 

For  notational  convenience,  we  dehne  the  planned  trajectories,  denoted  by  x(f),  to  be 
the  trajectories  along  which  the  follower  plans  to  move  at  tk  +  iS  but  may  not  do  so  because 
it  will  update  its  future  trajectory  at  tk  +  {i  +  l)h.  In  other  words,  the  planned  trajectories 
are  the  trajectory  driven  by  for  the  time  period  of  [tk  +  (i  +  1)5,  tk  +  iS  +  A],  while 

it  may  not  actually  be  executed  because  the  next  updating  result,  ,  will  drive 

the  agent  to  move  along  different  trajectories.  The  realized  trajectories,  denoted  by  x{t),  are 
dehned  as  the  trajectories  along  which  the  follower  actually  moves.  Referring  to  Fig.  3.4, 
the  planned  trajectories  and  realized  trajectories  are  represented  by  the  dashed  lines  and 
solid  lines,  respectively. 

If  we  let  5  — >  0  in  SLP,  we  will  obtain  the  following  continuous  local  pursuit  algorithm: 

Algorithm  2  (Continuous  Local  Pursuit):  Identify  two  states  xq  and  Xf  on  B.  Let 

Xo(t)  (t  E  [0,T])  be  an  initial  trajectory  satisfying  Eg.  (3.1)  with  Xo(0)  =  Xq,Xq{T)  =  Xf. 
Choose  the  following  interval  A  such  that  0  <  A  <  T.  Then  follow  the  next  rules  for  the  k^^ 
agent. 

1.  For  /c  =  1,  2,  3  . . .,  let  tk  =  kA  be  the  starting  time  of  agent.  Let  Uk(t)  =  0,  Xk(t)  = 

Xq  for  0  <t  <tk. 

2.  Calculate  uI{t)  for  all  t  E  [tk,tk  +  T]  such  that  f{xk{T),u'l{T))  =  xI{t),  where 

xRr)  achmves  I  re[f,t  +  A]  ift<tk  +  T-A 

^  \  3{xk{t),XfCk-LT -tCCk  +  T -t),  TE[tCk  +  T]  otherwise 

3.  Apply  Ukit)  =  nt(0)  to  the  agent. 

Repeat  from  step  2,  until  the  k*^  agent  reaches  Xf. 

Due  to  the  limitations  of  each  agent’s  computing  capability,  it  might  be  more  expedient 
to  apply  the  sampled  local  pursuit  (SLP)  because  the  agents  only  need  to  update  their 

^From  now  on,  we  will  utilize  Xk{t)  to  denote  both  the  agent  and  its  trajectory. 
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trajectories  finite  times  instead  of  continnously  in  CLP.  However  CLP  does  not  require 
storage  of  calculated  results  so  it  is  more  favored  in  situations  where  the  update  is  easily  to 
be  carried  out. 

If  we  are  dealing  with  a  free  dual  time  optimization  problem,  then  the  SLP  and  CLP 
algorithms  must  be  altered  so  that  agents  optimized  their  trajectories  connecting  them  pair¬ 
wise  with  respect  to  both  u  and  the  hnal  time. 

Continuous  local  pursuit  is  thus  altered  as  follows^. 

Algorithm  3  (Free  Final  Time  Local  Pursuit):  In  Algorithm  2  replace  the  step  2  with: 
2.  ’  Calculate  u^{t)  for  all  t  e  [tk,tk  +  T]  such  that  /(a;fc(r), (r))  =  x\{t)  ,  and 
xI{t)  achieves  rip{xk{t),Xk-iit),t,^)  {t  E  [f,f-|-P]),  where  pp  is  given  by  Eg.  (3.8). 

3.3  Algorithm  Advantages 

Each  agent  that  participates  in  local  pursuit  is  only  required  to  calculate  the  optimal  tra¬ 
jectory  from  itself  to  its  nearby  leader.  Meanwhile  the  “distance”  between  them  can  be 
limited  by  selecting  an  appropriate  following  interval  A.  Therefore  every  agent  only  needs 
to  sense  the  environment  within  a  limited  region  when  proceeding  pursuit  processes.  This 
is  preferable  to  obtaining  a  global  map  via  random  exploration  with  limited  sensor  range. 
For  example,  it  would  be  difficult  and  wasteful  for  a  single  robot  to  obtain  the  entire  map  of 
an  unknown  terrain.  Even  if  a  group  of  agents  can  be  dispersed  and  each  composes  a  map 
“patch”  around  itself,  it  is  not  guaranteed  that  the  composition  of  these  patches  covers  the 
whole  environment,  or  at  least  covers  the  region  containing  the  optimal  trajectory. 

Even  if  enough  patches  covering  the  entire  environment  have  been  collected,  the  fusion 
of  a  composite  map  still  requires  a  large  amount  of  information  communication.  A  powerful 
agent  is  also  needed  to  stitch  the  scattered  maps  using  sophisticated  fusion  algorithms. 
This  means  at  least  one  agent  in  the  group  has  enough  memory,  communication  bandwidth 
and  computing  ability  to  dealing  with  the  collecting  and  fusing  tasks  concerning  the  entire 
environment.  In  contrast,  in  local  pursuit,  there  is  no  requirement  for  agents  to  exchange 
local  maps  that  they  sense.  Agents  only  have  to  communicate  in  very  limited  ways,  by  using 
vision  to  track  one  another  or  by  communicating  in  primitive  ways  to  signal  their  locations, 
e.g.  sound  or  radio  emission. 

Furthermore,  even  if  an  effective  map  could  be  obtained,  solving  optimization  problems 

^In  SLP,  it  can  not  be  guaranteed  that  at  every  updating  time  the  minimum  time  to  reach  the  leader  P 
is  greater  than  or  equal  to  the  updating  interval  d.  If  P  <  S,  then  extra  costs  might  be  incurred.  Based  on 
the  above  consideration,  we  only  develop  the  Free  Final  Time  Local  Pursuit  (FFTLP)  in  continuous  version 
so  that  (5  <  P  is  guaranteed. 
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over  an  sophisticated  environment,  especially  in  an  environment  containing  different  kinds 
of  coordinate  patches,  requires  large  amounts  of  calculations.  The  example  includes  hnding 
geodesics  on  a  terrain  with  mountains  and  basins.  The  most  often  used  technique  in  such 
situations  is  numerical  method.  As  we  shall  see  later,  using  numerical  method  over  long 
distances  may  leads  to  huge  amounts  of  calculations.  However,  local  pursuit  only  requires 
computing  optima  within  small  regions  so  that  fewer  calculations  are  needed. 

In  summary,  local  pursuit  introduces  a  way  to  obtain  the  locally  optimal  trajectory"^ 
over  distance  by  many  short  pieces  generated  via  an  ordered  sequence  of  identical  agents, 
meanwhile  it  only  requires  local  knowledge  about  the  environment  as  well  as  calculation 
of  optimal  trajectories  within  small  regions.  Thus,  a  complicated  optimization  problem 
could  be  solved  by  a  group  of  cost-effective  agents.  The  trade-off  is  that  each  agent  must 
compute  locally  optimal  trajectories  more  than  once.  However,  the  deployment  of  a  group 
of  cheap  agents  using  local  pursuit  does  show  various  advantages  with  cost  and  reliability 
consideration,  if  compared  with  achieving  the  same  task  by  a  single,  expensive  agent. 


^This  conclusion  will  be  proved  in  next  chapter. 
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Chapter  4 

Current  Progress 


In  this  chapter  we  will  investigate  the  collective  behavior  of  the  gronp  involved  under  local 
pursuit.  Recall  that  each  algorithm  defines  an  ordered  sequence  of  trajectories  {xkit)}.  The 
convergence  of  the  sequence  involved  under  SLP  will  be  first  explored.  Then  the  limiting 
trajectory  will  be  proved  to  be  locally  optimal,  this  is  exactly  the  collective  property  we 
are  seeking  to  obtain.  Similar  results  will  be  explored  in  CLP  and  FFTLP.  Special  cases 
concerning  path  length  and  time  minimizing  problems  will  be  introduced  because  of  their 
prevalence  in  practice.  Lastly,  simulation  experiments  are  provided  to  illustrate  our  results. 

4.1  Results  on  Sampled  Local  Pursuit 

We  would  like  to  investigate  the  property  of  the  limiting  trajectory  generated  by  the  group, 
i.e.  Xk(t)  as  /c  — >  oo.  The  convergence  of  the  trajectories’  cost  will  be  explored  first,  then 
the  convergence  of  trajectories  themselves,  {xk{t)}.  After  that,  we  show  that  the  limiting 
trajectory  of  the  sequence,  denoted  as  Xooit),  is  locally  optimal. 

Lemma  4.1  (Convergence  of  Cost  in  SLP):  Assume  a  group  of  agents  Xo,Xi, ...  ,Xk 
evolve  under  “Sampled  Local  Pursuit”  with  starting  state  xq  and  target  state  Xf.  Suppose  an 
initial  control/trajectory  pair,  {mo(^),  3:o(t)}  (t  G  [0,T]),  satisfying  xo(t)  =  xq  andxo(T)  =  Xf 
is  given.  If  the  updating  time  satisfies  0  <  <5  <  A,  then  the  cost  of  the  iterated  trajectories 
will  converge,  i.e.  limk^ao  C{xk,tk,T)  exists. 

Sketch  of  Proof:  Given  an  existing  optimal  control  problem,  the  cost  of  any  trajectory 
satisfying  the  boundary  conditions  is  bounded  below.  By  investigating  the  pursuit  process 
between  Xkit)  and  Xk-iit)  pairwise,  we  can  prove  that  C{xk,tk,T)  <  C{xk-i,tk-i,T).  This 
is  enough  to  show  the  convergence  of  the  sequence  {C{xk,  tk,  T)}.  See  Appendix  A. 2  for  the 
detailed  proof.  □ 
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Xk-l(tk+28) 


Figure  4.1:  Sketch  of  the  pursuit  process  pairwise  in  SLP 


Nonetheless,  the  convergence  of  trajectories’  cost  does  not  imply  the  convergence  of 
the  trajectories  themselves.  If  there  exist  multiple  locally  optimal  trajectories  connecting 
the  leader  and  follower  at  the  updating  times,  then  the  convergence  of  trajectories  is  not 
guaranteed,  i.e.  Lemma  4.1  dehnes  an  equivalence  class  of  trajectories  with  the  same  cost. 

If  we  restrict  the  pursuit  process  to  take  place  within  a  “small”  region  by  selecting  A 
sufficiently  small,  e.g.  agents  follow  close  to  one  another,  there  will  exist  a  unique  locally 
optimal  trajectory  from  the  follower  to  the  leader  at  every  updating  time  tk  +  iS.  Thereafter 
we  obtain  the  following  result: 

Lemma  4.2  (Uniqueness  of  the  Limiting  Trajectory):  If  at  each  updating  time,  the 
locally  optimal  trajectory  obtained  through  SLP  is  unigue,  then  the  limiting  trajectory  x^oit) 
is  also  unigue. 

Sketch  of  Proof:  We  will  show  that  if  there  exist  more  than  one  trajectories  that  Xk{t) 
might  take,  for  k  large  enough,  then  the  cost  of  one  trajectory  must  be  less  than  the  others. 
This  contradicts  to  what  we  have  obtained  from  Lemma  4.1,  which  shows  that  the  limiting 
trajectories  should  have  the  same  cost  if  they  exist.  See  Appendix  A. 3  for  the  details  of  the 
proof.  □ 


The  locally  optimal  trajectories  obtained  at  every  updating  time  are  smooth  in  many 
optimal  control  problems,  e.g.  the  solution  to  the  Euler-Lagrange  equation  in  calculus  of 
variations.  Nonetheless,  Xk{t)  is  only  known  to  be  piecewise  smooth.  For  example,  in 
with  Xk  =  Mfc,  if  the  locally  optimal  trajectories  are  straight  lines,  Xk{t)  is  not  smooth  for 
there  exists  a  corner  at  the  joint  of  two  segments.  However,  we  can  show  that  the  limiting 
trajectory  is  smooth  in  the  time  interval  [0,T],  the  locally  optimal  trajectories  obtained  at 
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every  updating  time  are  smooth.  The  following  dehnitions  will  be  necessary  for  discussing 
the  properties  of  the  limiting  trajectory. 

Definition  4.1:  Let'yiit)  and  72(t)  be  trajectories  of  Eq.  (3.1),  defined  on  a  time  interval  1 1 
and  another  time  interval  I2  respectively,  where  Ii  fl  /2  We  say  that  71  and  72  overlap 
ifliit)  =  72 (i)  for  all  t  e  fin  h- 

Definition  4.2:  Let  ■yfit)  and  72 (t)  be  trajectories  of  Eq.  (3.1),  defined  on  a  time  interval 
Ii  and  another  time  interval  I2  respectively,  where  fl  I2  The  composition  0/ 71(f) 
and  72(f)  on  the  interval  Ii  U  I2  is  defined  as 

A  /  7i(^)  t  G  Ii,t  ^  I2  —  h  r\  I2 

7i  o  70  =  < 

[  72(f)  f  ^  /i, f  G  /2  —  /i  n  J2 

Lemma  4.3  (Smoothness  of  Composition):  Suppose  that  in  Lemma  f.l  the  updating 
interval  5  and  the  following  interval  A  satisfy  that  0  <  d  <  A,  then  the  planned  trajectory 
x(t)  and  realized  trajectory  x(t)  of  the  limiting  trajectory  overlap.  Eurthermore,  if  the  locally 
optimal  trajectories  obtained  at  every  updating  time  are  smooth,  then  the  limiting  trajectory 
is  also  smooth. 

Sketch  of  Proof:  We  will  hrst  explore  that  the  planned  trajectory  and  realized  trajectory 
of  Xoo{t)  overlap  by  contradiction.  Then  it  is  shown  that  the  limiting  trajectory  is  piecewise 
smooth  and  its  neighboring  segments  overlap,  the  smoothness  of  the  limiting  trajectory  over 
the  entire  time  interval  is  an  immediate  consequence.  See  Appendix  A. 4  for  the  details  of 
the  proof.  □ 

Before  proceeding  to  the  main  theorem,  we  are  required  to  dehne  the  following  condition. 

Condition  4.1:  Assume  there  exists  an  e  >  0  such  that  for  all  a,  bi,  62  G  B  and  all  A  >  0, 
the  optimal  cost  rj{a,  hi,  A,  0,  A)  from  a  to  hi  and  rj^a,  62,  A,  0,  A)  from  a  to  62  satisfy 

\\bi  -  62II  <  £  ^  ||?7(a,  61,  A,0,  A)  -  77(0, 62,  A,  0,  A)||  <  CA  (4.1) 

for  some  constants  C  independent  of  A. 

A  piecewise  locally  optimal  trajectory  is  not  necessarily  optimal.  However,  the  composi¬ 
tion  of  overlapping  locally  optimal  trajectories  is  locally  optimal  if  Condition  4.1  is  satished. 
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Lemma  4.4  (Composition  of  Optimal  Trajectories):  Let  7i(t)  and  72(t)  be  over¬ 
lapped  locally  optimal  trajectories  defined  on  a  time  interval  Ii  and  another  time  interval  I2 
respectively,  where  Ii  fl  I2  i^0-  If  Condition  f.l  is  satisfied,  then  the  composition  71  o  72  is 
locally  optimal  on  Ji  U  /2. 

Sketch  of  Proof:  Suppose  that  the  composition  (call  it  x*{t))  is  not  locally  optimal,  then 
there  must  exist  another  trajectory  (call  it  xft))  nearby  such  that  \\x(t)  —  a:*(t)||oo  <  and 
C{x(t),0,T)  <  C{x*(t),0,T).  We  can  then  use  Condition  4.1  to  obtain  a  contradiction, 
namely  that  C{x(t),0,T)  >  C{x*(t),0,T).  See  Appendix  A. 5  for  the  complete  proof.  □ 

The  next  theorem  is  an  immediate  consequence  of  the  above  lemmas. 

Theorem  4.1  (Sampled  Local  Pursuit):  Suppose  a  group  of  agents  {xk}  evolve  under 
sampled  local  pursuit  and  at  each  updating  time  t  =  tk  +  id,  the  locally  optimal  trajectory  from 
Xk(t)  to  Xk-i(t)  is  unigue.  If  the  updating  interval  and  following  interval  satisfy  0  <  <5  <  A 
and  Condition  f.l  is  satisfied,  then  the  trajectory  seguence  converges  to  a  unigue  local  opti¬ 
mum.  Furthermore,  if  the  locally  optimal  trajectories  at  every  updating  time  are  smooth,  the 
limiting  trajectory  is  also  smooth. 

Proof:  From  Lemma  4.2,  the  limiting  trajectory  is  unique.  We  know  that  Xoo{t)  {t  G  [0,  A)) 
and  Xooit)  (t  G  [5,  <5  +  A))  are  locally  optimal  for  the  realized  trajectory  and  planned 
trajectories  overlap  (Lemma  4.3).  The  optimality  of  Xooit)  {t  G  [0,5  +  A))  follows  from 
Lemma  4.4.  Repeating  this  argument  on  [id,  +  A]  (i  =  0, 1,  2  . . .)  leads  to  the  result  that 
Xoo{t)  {t  G  [0,T])  is  locally  optimal.  The  proof  of  smoothness  follows  from  a  similar  argu¬ 
ment.  n 


4.2  Results  on  Continuous  Local  Pursuit 

In  the  case  of  continuous  local  pursuit,  the  follower  keeps  on  updating  its  movement  at  every 
t  G  [tkytk  +  T],  i.e.  the  updating  interval  5  — 0.  Similar  to  the  sampled  local  pursuit,  we 
assume  the  selection  of  A  guarantees  that  at  every  updating  time  there  is  a  unique  optimal 
trajectory  from  the  follower  to  the  leader.  We  will  hrst  show  that  a  single  update  to  the 
leader’s  trajectory  will  result  in  less  cost  than  what  is  incurred  by  the  leader,  no  matter 
when  the  update  occurs.  Then  we  will  explore  the  convergence  of  the  trajectories’  cost  in 
CLP.  The  remaining  arguments  are  quiet  similar  to  what  we  had  discussed  in  SLP. 

Lemma  4.5:  Let  X  G  [0,T).  Suppose  that  a  follower  replicates  the  leader’s  trajectory  on 
t  G  [tk,  tk-\-  X)U[tk  +  X  +  A,tk  +  T]  if  X  <T  —  A  (or  t  ^  [tk  +  X,tk  +  T]  if  X  >  T  —  A),  while 
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during  [tk  +  A,  +  A  +  A]  it  follows  the  optimal  trajectory  joining  Xkifk  +  A)  and  Xk-iitk  +  A) 
m  A  (or  {T  —  X))  units  of  time.  Then  the  cost  along  the  follower’s  trajectory  will  be  no 
greater  than  the  leader’s. 

Sketch  of  Proof:  We  can  investigate  the  cost  along  the  follower  and  the  leader,  respec¬ 
tively.  The  overlapping  parts  of  the  leader’s  and  follower’s  trajectories  will  lead  to  equal 


Figure  4.2:  Sketch  of  a  single  update. 

costs,  while  the  follower  incurs  less  cost  during  [tk  -|-  A,  +  A  -|-  A].  It  follows  that  the  whole 
cost  along  the  follower  is  less  than  the  leader’s.  See  Appendix  A. 6  for  the  complete  proof.  □ 

Lemma  4.6  (Convergence  of  Cost  in  CLP):  In  the  case  of  continuous  local  pursuit,  the 
cost  of  the  iterated  trajectories  converges. 

Sketch  of  Proof:  The  movement  of  the  agent  under  CLP  can  be  interpreted  as  the 
consequence  of  applying  inhnitely  moving  “updates”  to  the  leader’s  trajectory.  From  Lemma 
4.5,  each  update  leads  to  non- increasing  cost  so  that  inhnite  times  of  update  will  also  lead 
to  less  or  equal  cost  for  the  follower.  See  Appendix  A. 7  for  the  details  of  the  proof.  □ 


Now  the  main  result  concerning  continuous  local  pursuit  can  be  derived  easily  by  an 
argument  similar  to  what  was  used  for  sampled  local  pursuit. 

Theorem  4.2  (Continnous  Local  Pursuit):  Suppose  a  group  of  agents  evolve  under 
continuous  local  pursuit  and  that  at  every  updating  time  t,  the  locally  optimal  trajectory  from 
Xkit)  to  Xk-iit)  is  unigue.  Then  the  limiting  trajectory  obtained  is  unigue  and  locally  opti¬ 
mal.  It  is  smooth  also  if  the  locally  optimal  trajectories  calculated  at  every  updating  time  are 
smooth. 

Proof:  First,  we  assume  there  are  two  different  limiting  trajectories  Xi{t)  and  X2{t).  The 
proof  of  Lemma  4.2  shows  that  if  there  exist  updates  during  the  non-overlapping  parts  of 
successive  trajectories  xi{t)  and  X2{t),  the  whole  cost  along  the  follower  will  be  less  than  the 
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leader’s,  even  in  the  case  where  inhnite  npdates  occnr  because  the  number  of  updates  does 
not  change  this  property.  If  there  exist  more  than  one  limiting  trajectories,  the  decrease 
of  cost  from  the  leader  to  the  follower  contradicts  the  fact  that  all  the  limiting  trajecto¬ 
ries  must  have  the  same  cost.  Therefore  the  limiting  trajectory  is  unique.  It  follows  that 
Xk-i{t  —  A)  =  Xk{t)  if  Xk-i(t)  =  Xooit  —  ffc-i)-  If  we  pick  a  6i  such  that  0  <  (5i  <  A,  the 
limiting  trajectory  is  piecewise  smooth  and  locally  optimal.  Using  the  arguments  in  Lemma 
4.3  and  Lemma  4.4  we  can  say  Xoo{t)  is  smooth  and  locally  optimal  over  the  entire  time 
interval.  n 


4.3  Results  on  Free  Final  Time  Local  Pursuit 

Notice  that  Lemma  4.5  still  holds  for  the  free  hnal  time  local  pursuit.  The  convergence  of 
the  trajectories’  cost  is  easily  to  obtain  using  the  similar  arguments  in  Lemma  4.6.  Using 
the  similar  argument  in  Theorem  4.2  and  changing  the  argument  to  free  hnal  time  version 
will  yield  the  following  result. 

Theorem  4.3  (Free  End-Time  Local  Pursuit):  Suppose  a  group  of  agents  evolve  under 
free  final  time  local  pursuit  and  at  every  updating  time  t,  the  locally  optimal  trajectory  from 
Xkit)  to  Xk-iit)  with  free  final  time  is  unigue.  Then  the  limiting  trajectory  is  unigue  and 
locally  optimal,  it  is  also  smooth  if  the  locally  optimal  trajectories  calculated  at  every  updating 
time  are  smooth. 

Proof:  The  proof  is  simple  and  will  be  omitted  here.  □ 


4.4  Summary 

Until  now,  we  have  seen  that  each  algorithm  (SLP,  CLP,  FFTLP)  will  generate  an  interesting 
“collective  pattern”  -  the  local  optimum  for  proposed  optimal  control  problem.  Although 
each  agent  only  solves  the  optimal  control  problem  within  a  small  region  (limited  by  A),  the 
trajectories  generated  by  them  are  gradually  optimized.  Each  agent  “learns”  from  its  prede¬ 
cessor  and  the  limiting  trajectory  exhibits  the  collective  intellect  of  the  group.  Therefore,  a 
complicated  task  (optimizing  over  long  distance)  is  separated  into  small  tasks  requiring  less 
capabilities  of  sensing,  communicating  and  computing. 

Our  algorithms  fall  into  the  category  of  “learning  by  repetition” .  Newton’s  method  and 
gradient  methods  are  well-known  examples  in  this  category,  and  are  usually  applied  to  solve 
extremal  problems  in  hnite  dimensional  vector  spaces  [6].  Extensions  of  such  methods  in 
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function  spaces  also  enable  the  development  of  trajectory  optimization  algorithms  through 
repetition.  For  example,  the  work  of  [40]  utilized  a  developed  gradient  method  to  iteratively 
optimize  the  control  for  a  specihed  dynamic  system  .  The  control  u{t)  is  derived  by 


du 

dt 


dW{x,  t) 
dx 


X(x,  u) 


(4.2) 


where  X{x,u)  =  x{t)  are  the  system  dynamics  and  W{x,t)  is  the  minimal  cost  of  reaching 
the  hnal  state  Xf  provided  with  the  initial  state  is  x{to)  =  x.  Eq.  (4.2)  converges  to  the 
optimal  control  u*{t)  and  x*{t)  if  the  optimal  control  is  smooth. 

However,  existing  algorithms  usually  require  the  cost  function  and  the  control  to  be 
partial  differentiable.  To  proceed  with  the  above  algorithm,  they  also  need  to  store  and 
describe  the  entire  x^,  in  order  to  get  x^+i-  Moreover,  to  obtain  a  smooth  curve,  inhnitely 
small  time  increments  are  required  so  that  laborious  calculations  are  introduced.  All  these 
factors  hinder  the  application  of  these  algorithms  in  decentralized  systems  whose  members 
are  working  cooperatively. 

In  contrast,  our  proposed  algorithms  are  suitable  for  a  large  class  of  optimization  prob¬ 
lems  and  do  not  suffer  from  the  above  drawbacks.  For  example,  our  algorithms  could  be 
applied  in  the  situations  where  the  control  and  trajectory  are  not  smooth  such  as  Bang-bang 
control.  The  computing  requirement  for  each  agent  could  be  limited  by  dehning  an  appro¬ 
priate  A.  Furthermore,  each  agent  only  need  very  limited  information  of  its  predecessor  so 
that  multiple  agents  could  work  together  to  achieve  the  most  effectiveness. 


4.5  Special  Cases:  Length  and  Time  Minimization 

We  have  the  additional  interesting  results  for  the  trajectory  optimization  problems  that  often 
involve  reaching  a  desired  target  state  with  minimum  path  length  or  end  time.  We  state  it 
as  follows. 

Theorem  4.3  :  If  the  time  rate  of  the  change  of  the  cost  along  a  trajectory  is  independent 
on  Xk{t)  for  all  t,  then  the  minimum  cost  from  the  follower  to  the  leader  with  free  final  time 
is  strictly  decreasing  under  local  pursuit,  unless  the  leader  moves  along  a  locally  optimal 
trajectory. 

Proof:  Let  p{a,b)  =  Jp{x* ,x* ,t)  be  the  minimum  cost  to  steer  system  from  state  a  to 
another  b.  For  the  pursuit  process  shown  in  Fig.  4.3,  We  have  that 
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Xk(t) 


Figure  4.3:  The  minimum  cost  from  Xk+i{t)  to  Xk{t)  is  decreasing  if  dC /dt  is  independent. 


p{xk+i{t  + 5),Xk{t  + 5))  < 

< 


p{xk+i{t  +  5),Xk{t))  +  p{xk{t),Xk{t  +  5)) 
p{xk+i{t  +  5),Xk{t))  +  C{xk{t),t,5) 

p{xk+i {t  +  5),  Xk{t))  +  C{xk+i (t) ,  t ,  S) 

p{Xk+l{t),Xk{t)) 


(4.3) 


If  the  equalities  hold  in  Eq.  (4.3)  then  Xk(t)  must  be  moving  along  an  optimal  trajectory.  □ 


This  result  has  a  variety  of  applications,  e.g.  the  minimum  time  control  problem 

J{x,x,0,T)  =  T  ||ai||  <  1  (4.4) 

whose  solution  could  be  obtained  via  the  maximum  principle;  or  the  minimum  path  length 
problem  with  the  condition  that  all  agents  are  moving  on  unit  speed 

J{x,x,0,T)  =  T  with  ||i;||  =  1,  T  is  free  (4.5) 


4.6  Simulations 

We  now  present  some  simulation  results  concerning  application  of  local  pursuit  in  different 
optimal  control  problems. 

Sampled  Local  Pursuit 

To  illustrate  the  effectiveness  of  sampled  local  pursuit,  we  solve  the  minimum  path  length 
problem  on  with  boundary  conditions  a:(0)  =  0,a:(l)  =  1.  Obviously  the  optimal  trajec¬ 
tory  is  a  straight  line.  We  set  6  =  0.25,  A  =  0.5,  T  =  1.  Fig.  4.4  shows  5  trajectories  iterated 
from  sampled  local  pursuit.  The  5*^  trajectory  is  close  to  straight  line. 
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Figure  4.4:  Iterated  trajectories  of  minimum  length  problem  through  SLP  on 


A  Lagrangian  Example 

Fig.  4.5  illustrates  the  application  of  continuous  local  pursuit  in  systems  with  drift.  Here 


Figure  4.5:  Iterated  trajectories  for  the  Lagrangian  problem  through  CLP  with  A  =  0.5 
the  system  dynamic  is 


x{t)  +  x{t)  =  u{t) 


and  we  want  to  minimize 


{x{ty  +  u{ty)dt 


'0 


with  x(0)  =  0,x(l)  =  1 
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The  locally  optimal  trajectory  could  be  obtained  through  Euler-Lagrange  equation  from 
calculus  of  variations.  The  following  interval  A  is  set  to  be  0.5.  Fig.  4.5  shows  that  the 
trajectory  sequence  converges  to  the  optimum. 

Minimum  Time  Control 

Consider  the  following  second-order  system 


X  =  u 


u\\  <  1 


And  we  want  to  minimize  the  cost  J{x,  x,  0,  T)  =  T  with  the  boundary  conditions  of  a;(0)  = 
7r,x{T)  =  0.  From  maximum  principle  it  is  well  known  that  the  optimal  control  for  this 
problem  is  the  Bang-bang  control. 


u*{t) 


-1  ifte[0,T/2) 
1  ifte[T/2,T] 


(4.6) 


With  the  following  interval  A  =  O.Svr,  as  Fig.  4.6  illustrates,  the  trajectory  of  6*^  agent  is 
essentially  under  optimal  control,  which  means  the  convergence  is  really  rapid. 


Time  (s) 


Time  (s) 


Figure  4.6:  Iterated  trajectories  for  minimum  time  control  problem  through  FFTLP  with 
A  =  O.Stt 


Geodesic  Discovery 

Now  we  will  show  a  geodesic  discovery  example  that  involves  complicated  calculation  over 
entire  environment  but  relatively  simpler  in  local  patches.  This  example  simulates  two  hills 
by  two  cones.  The  starting  state  is  (3500,0,0)  and  the  end  state  is  (—1300,0,0).  The  hrst 
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Figure  4.7:  Iterated  trajectories  for  the  geodesic  discovery  problem  through  CLP. 


agent  moves  on  a  trajectory  that  follows  along  the  border  of  the  cones.  The  geodesic  over 
large  distance  is  not  easy  to  compute  because  not  only  it  demands  knowledge  over  the  entire 
map  but  also  there  are  4  coordinate  switches  along  the  path^. 

However,  if  we  set  A  =  0.2T,  the  follower  is  at  most  required  to  do  calculation  with  one 
coordinate  switch  so  that  the  amount  of  calculation  at  every  step  is  decreased,  compared  to 
computing  over  the  whole  map.  As  Fig.  4.7  illustrates,  the  iterated  trajectories  converge  to 
the  optimum. 


^If  applied  numerical  method  over  the  entire  map,  the  number  of  the  time  segments  is  4. 


Chapter  5 


Ongoing  Work 


In  this  chapter  we  will  discuss  some  ongoing  research  directions  related  to  local  pursuit,  as 


listed  as  follows. 

•  Notice  that  a  large  category  of  optimal  control  problems  are  which  involve  free  final 
state  or  “point-to-set”  problems  as  oppose  to  point  boundary  conditions  problems.  We 
would  like  to  generalize  our  pursuit  algorithm  to  such  problems. 

•  The  limiting  trajectory  obtained  from  local  pursuit  may  converge  to  a  global  optimum 
or  a  local  optimum,  depending  on  the  parameters  of  the  algorithm.  We  will  explore 
that  dependence  and  determine  which  optimum  the  trajectory  sequence  converges  to. 

•  The  performance  of  local  pursuit  with  noisy  measurements  will  be  considered  due  to  its 
relevance  in  practice.  We  want  to  know  whether  the  agents  can  estimate  the  solution 
in  the  absence  of  precise  sensor  readings. 

•  We  will  explore  the  advantages  of  local  pursuit  in  the  numerical  computation  of  optimal 
trajectories. 

Finally,  we  will  look  into  the  development  of  other  biologically  inspired  algorithms  for 
complicated  tasks  in  engineering  or  other  helds.  The  potential  tasks  and  ongoing  steps  will 
be  outline  next. 

5.1  Optimal  Control  Problems  with  Free  Final  State 

Many  optimal  control  problems  with  hxed  final  time  include  a  penalty  to  the  final  state  but 
do  not  impose  any  constraints  on  it,  i.e. 


(5,1) 
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The  example  includes  the  LQR  problems,  which  could  be  solved  by  introducing  a  feedback 
control  and  a  Riccati  equation,  as  we  know. 

We  would  like  to  modify  local  pursuit  to  apply  to  this  class  of  problems,  if  possible. 
Recall  that  in  local  pursuit  we  are  gradually  optimizing  our  initial  solution,  it  seems  that 
if  the  cost  incurred  by  an  agent  is  no  greater  than  its  predecessor,  the  trajectory  sequence 
will  converge  to  a  local  optimum.  However,  if  agents  are  always  moving  on  locally  optimal 
trajectories  from  themselves  to  their  predecessors,  we  can  obtain  an  non-increasing  trajectory 
sequence  but  the  end  point  is  determined  by  the  hrst  agent  and  is  not  the  best  choice.  We 
should  have  some  freedom  in  choosing  the  hnal  state  instead  of  hxing  it  by  simply  catching 
up  the  leader’s  position.  On  the  other  hand,  if  at  every  updating  step  the  follower  is  dealing 
with  an  optimal  control  problem  with  free  hnal  state,  then  it  does  not  need  the  leader.  The 
follower  can  determine  the  locally  optimal  trajectory  only  by  its  current  state  and  the  A, 
thus  agents  are  totally  independent.  The  aimlessly  pursuing  process  will  not  let  the  follower 
“learn”  from  the  leader  and  we  can  not  guarantee  the  follower  does  better  than  the  leader. 

Based  on  the  above  consideration,  we  will  let  the  follower  “catch”  the  leader  before  the 
leader  reaches  the  hnal  state,  i.e.  the  follower  will  solve  an  optimal  control  problem  with 
hxed  end  point  during  where  i6  <  T  —  A.  After  the  leader  reaches  the  hnal 

state,  the  follower  will  solve  an  optimal  control  problem  with  free  hnal  state.  By  dividing 
the  time  into  two  diherent  phases  -  “catching  up”  and  “free  running”  -  the  follower  has  the 
potential  of  “learning”  from  the  leader  as  well  as  choosing  the  best  hnal  state.  The  trajectory 
sequence  is  expected  to  be  gradually  optimized  through  learning  while  it  also  benehts  from 
the  property  of  free  hnal  state. 

As  before,  we  dehne  the  cost  of  a  segment  of  an  optimal  trajectory  over  [to,  to  -|-  T]  as: 


(5.2) 


where  x*{t)  minimize  Eq.  (5.1)  with  the  restriction  of  x*{t)  =  a.  We  here  set  up  an  algo¬ 


rithm  similar  to  SLP,  except  replacing  the  step  2  to: 

2.  When  t  =  tk  +  iS,i  =  0, 1,  2,  3, . . .,  calculate  u^{t)  such  that  /(a:fc(r),  M*(r))  =  x^ij), 


where 

xKt)  achieves 


ri{xk{t),Xk-i{t),A,t,A),  Te[t,t  +  A]  ifA  +  i5<T 
Vfs{xk(t),tk  +  T  —  t,t),  TE[t,tk  +  T]  otherwise 


If  the  hnal  state  is  not  free  but  restricted  to  a  set,  it  should  satisfy  the  hnal  condition 


of 


q{x{to  +  T))  =  0  T  is  free 


(5.3) 
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For  example,  if  the  final  state  is  located  on  a  unit  circle,  the  condition  will  be  ||x(fo+7")||  =  1. 
From  optimal  control,  we  have  known  that  the  best  hnal  time  and  state  could  be  determined 
by  the  transversality  condition. 

We  also  dehne  the  cost  of  a  segment  of  an  optimal  trajectory  over  [to,  to  +  T]  as: 

rto+T 

r]fsg{a,T,to)  =  Q{x{to +  T))  +  (5.4) 

Jto 

where  x*{t)  is  the  optimal  trajectory  for  the  cost  in  Eq.  (5.1)  while  it  satishes  that  x*{to)  = 
a,  q{x*{to  +  T)  =  0. 

Of  course  we  need  both  “catching  up”  and  ’’free  running”  phases  if  applying  local  pursuit 
into  such  problems.  We  set  up  the  algorithm  as  same  as  the  CLP,  except  replacing  the  step 
2  to 


2.  Calculate  uI{t)  fro  all  t  G  [tk,tk  +  T]  such  that  /(a:fc(r),  (r))  =  x^ij),  where 

r7(xfc(t),Xfc-i(t),  A,t,  A),  re[t,t  +  A]  z/t<4  +  T-A 
Vfsg{xk(t),tk  +  T  —  t,t),  TE[t,tk  +  T]  otherwise 


Xl(T 


achieves 


The  remaining  work  is  to  prove  the  optimality  of  the  limiting  trajectory  obtained  from 
the  above  algorithms.  We  may  follow  the  similar  steps  as  we  do  with  SLP  and  CLP: 


1.  Proving  the  convergence  of  the  cost  incurred  by  the  trajectories. 

2.  Proving  the  uniqueness  of  the  limiting  trajectory. 

3.  Proving  the  optimality  of  the  composition  of  two  segments  of  locally  optimal  trajecto¬ 
ries. 


4.  Proving  the  local  optimality  of  the  limiting  trajectory. 


5.2  Convergence  to  Global  vs  Local  Optimum 

The  limiting  trajectory  in  local  pursuit  is  determined  by  the  parameters  A,  5  and  the  initial 
trajectory  XQ{t).  As  we  shall  see,  different  parameters  may  result  in  reaching  different  local 
optima.  Xo(t)  is  the  initial  trajectory  generated  by  estimation  or  random  exploration,  and 
is  not  determined  by  the  algorithms  themselves. 

There  is  an  obvious  trade-off  in  choosing  A:  large  values  of  A  may  require  signihcant 
demands  on  each  agent’s  capabilities  of  sensing,  communicating  and  computing,  however, 
large  A  will  also  generally  result  in  faster  convergence  and  the  ability  of  local  pursuit  to 
“escape”  local  optima.  For  the  sake  of  space  limitations  we  restrict  our  discussion  to  the 
following  example^. 

^In  this  section,  all  the  examples  are  deal  with  problems  of  minimizing  the  path  length. 
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•  If  pursuit  takes  place  on  a  surface  with  “holes”  or  “obstacles”  and  the  initial  feasible 
path  winds  around  the  obstacles.  The  iterated  trajectories  may  converge  to  a  global 
optimum  instead  of  a  local  one  with  large  A.  For  example,  if  the  A  is  greater  than 
1/2  the  perimeter  of  the  largest  circle  that  surrounds  the  holes  and  all  agents  run  at 
unit  speed,  then  the  iterated  trajectories  converge  to  the  global  optimum. 


Figure  5.1:  Larger  5  may  lead  to  a  better  result. 


The  6  is  much  easier  to  be  adjusted  because  the  only  requirement  for  6  is  that  0  <  <5  <  A. 
Nonetheless,  there  seems  no  simple  relationship  between  the  group’s  performance  and  6. 
Smaller  6  seemingly  refers  to  more  frequent  updates  and  will  bring  better  result,  however, 
in  fact  it  may  lead  to  the  local  optimum  instead  of  the  global  optimum.  This  can  be  seen 
from  the  following  example. 

•  If  pursuit  takes  place  on  a  plane  with  a  hole  of  unit  radius,  and  each  agent  moves  on 
unit  speed.  Let  the  hrst  agent  move  counterclockwise  along  one  local  minimum  from 
S  to  E,  as  illustrated  in  the  left  of  Fig.  5.1.  The  A  is  set  to  be  3.1416  (a  little  more 
than  tt).  Then  the  simulation  shows  that  for  some  6,  e.g.  6  =  2.5,  all  the  followers 
travel  along  the  same  trajectory  as  the  leader’s,  and  for  some  6,  e.g.  6  =  3,  the  limiting 
trajectory  will  converge  to  the  global  optimum.  Here  we  see  larger  6  leads  to  better 
result.  The  two  limiting  trajectories  in  contrast  are  illustrated  in  Fig.  5.1. 

We  see  that  carefully  selected  parameters  may  lead  to  the  global  optimum.  Directly 
determining  the  desired  parameters  that  lead  to  desired  local  optimum  seems  difficult.  How¬ 
ever,  given  the  parameters  of  an  algorithm  and  a  desired  local  optimum,  we  can  investigate 
whether  the  trajectory  seqnence  will  converge  to  it.  We  will  proceed  using  Lyapunov’s 
method. 
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Recall  that  an  agent’s  trajectory,  Xk+iif),  can  be  determined  by  the  algorithm’s  pa¬ 
rameters,  if  given  its  leader’s  trajectory  Xk{t).  At  every  updating  time,  the  locally  optimal 
trajectory  is  determined  by  the  starting  state,  the  end  state  and  A,  so  we  can  assume  that 
the  optimal  trajectory  minimizing  the  cost  J  is  given  by  the  mapping: 


x*{t)  =  h{a,  b,  A,  r)  r  G  [t,  t  -|-  A] 


(5.5) 


with  the  boundary  condition  x(t)  =  a,x(t  -|-  A)  =  b.  We  assume  that  the  mapping  h  : 
D  X  D  X  X  M  — >  B  X  J  (I  is  an  time  interval)  dehnes  a  continuous  trajectory.  Given  fixed  A 
and  6  (assume  we  start  with  SLP),  and  denoting  the  trajectory  as  x^it)  (t  G  [tk,tk  +  T]), 
then  the  {k  +  1)*^  trajectory  is 


Xk+l{t) 


h{xk+iitk+i  +  j6),Xk{tk+i  +  A  -h  j5),  A,  t)  if  t  G  [4+i,  4+i  +  ib] 
h{xk+i{tk+i  +  ib),Xk{tk  -f  T),  T  -  (i  1)5,  t)  otherwise 


where  the  integer  i  satishes  iS  <T  —  A  and  {i  +  1)5  >  T  —  A. 
For  simplicity,  we  can  write  Eq.  (5.6)  as 


Xk+i{t)  =  fp{xk{t),A,5,t)  t  G  [0,T]  (5.7) 

where  /p  :  B  x  [0,T]  x  1R+  x  M  — >  B  x  [0,T]  is  a  continuous  function.  This  is  similar  to  the 
state  equation  of  a  dynamic  system,  if  we  think  every  trajectory  as  a  state  in  the  space  of 
trajectories  (B  x  [0,T]). 

Noticing  that  Lyapunov’s  method  is  a  commonly  applied  technique  in  analyzing  the 
convergence  properties  of  dynamic  systems,  we  plan  to  set  up  a  Lyapunov  function  in  the 
space  of  trajectories.  The  constructed  Lyapunov  should  satisfy  the  following  condition: 


V{x*{t))  =  q  and  V{xk{t))  >  g  in  B  x  [0,T]  —  {a:*(t)}  (5.8) 

V{xk+i{t))  -  V{xk{t))  <  -pk  <  0  for  Xk{t)  G  B  X  [0,T]  -  {x*(f)}  (5.9) 

where  x*(t)  is  the  predetermined  local  optimum  and  pk  ^  0  only  if  Xk(t)  — >•  x*(t).  If  we  can 
find  such  a  Lyapunov  function,  we  can  conclude  that  the  trajectory  sequence  generated  by 
local  pursuit  and  started  with  an  initial  trajectory  Xo(t)  G  B  x  [0,T]  will  converge  to  x*(t). 
Furthermore,  by  Ending  a  region  in  the  space  of  trajectories  where  the  Lyapunov  function 
satisfies  the  above  condition  of  Eq.  (5. 8), (5. 9)  and  is  bounded  above,  we  are  expected  to 
find  the  region  of  attraction  of  this  local  optimum. 


5.3  Pursuit  with  Noisy  Measurements 

In  the  real  world,  sensors  and  actuators  embedded  in  robots  are  not  perfect  and  the  operation 
of  them  is  often  distorted  by  noise.  There  are  a  number  of  key  points  to  understand  the 
uncertainty  [21]: 
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•  Sensors  only  deliver  uncertain  values  in  practice.  At  best  they  deliver  an  approximation 
to  what  they  are  measuring.  The  disturbance  in  environments,  difference  between 
physical  parts  and  measuring  mechanism  are  also  bringing  unexpected  errors  for  every 
sensor.  Moreover,  sensors  do  not  deliver  direct  descriptions  of  the  world.  They  do 
not  identify  the  objects  and  separate  the  effects  due  to  their  own  motion  and  objects’ 
motion.  Therefore  we  can  hardly  obtain  an  accurate  model  for  a  real  sensor. 

•  Commands  to  actuators  can  have  uncertain  effects.  Many  layers  of  refinement  may 
be  performed  before  high  level  action  commands  become  appropriate  motor  currents, 
each  may  bring  uncertainty.  Depending  on  the  hardware  and  software  accuracy,  errors 
could  accumulate  rapidly.  These  uncertainties  make  it  difficult  to  model  actuators 
accurately. 

What  we  want  to  investigate  is  the  collective  behavior  of  the  system  when  the  mea¬ 
surements  made  by  agents  are  subjected  to  noise.  We  would  like  to  develop  algorithms  that 
work  not  only  well  but  also  robustly.  For  the  sake  of  simplicity,  we  may  consider  the  noise 
of  sensors  and  actuators  together  and  model  the  noise  in  a  generic,  abstract  context  as 

x{t)  =x*{t)  +  ^{x)uj{t)  (5.10) 

where  x*{t)  is  the  actually  optimal  trajectory,  .^(x)  is  a  real  valued  function  and  u{t)  is 
a  white  Gaussian  process  with  mean  lD  and  variance  T?.  What  we  are  interested  in  is  to 
investigate  the  limiting  trajectory  and  determine  its  distribution. 

Another  source  of  uncertainty  comes  from  the  estimation  of  optima,  when  precise  so¬ 
lutions  to  locally  optimal  trajectories  is  impossible  to  obtain  even  though  all  measurement 
and  models  are  perfect.  For  example,  for  an  uneven  terrain  that  can  not  be  described  by 
any  existing  geometric  objects,  it  is  hardly  to  obtain  an  analytical  solution  to  the  geodesics 
on  it.  However,  sometimes  we  can  estimate  the  solution  with  bounded  error  through  numer¬ 
ical  methods  or  other  simple  rules,  by  investigating  properties  of  the  environment  and  the 
optimal  solution.  The  error  of  local  estimate  is  related  to  the  “following  distance”  between 
the  leader  and  the  follower:  the  smaller  the  distance  is,  more  precise  the  estimate.  So  in  this 
case  the  locally  optimal  trajectories  that  agents  obtain  are  as  follow. 

x{t)  =  X*  {t)  +  e{t)  ||e(t)||oo<£  (5.11) 

and  we  are  interested  to  find  the  range  of  the  limiting  trajectory’s  error.  If  the  error  of  the 
limiting  trajectory  is  bounded  (depends  on  £),  then  we  have  found  a  method  to  transform 
the  local  trajectories’  error  to  the  entire  trajectory’s  error. 

In  order  to  proceed  with  this,  the  following  steps  will  be  considered: 
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1.  Making  a  model  of  locally  optimal  trajectories  at  updating  times,  as  Eq.  (5. 10), (5. 11) 
did. 

2.  Investigating  the  evolution  of  each  pair  of  leader  and  follower  under  noisy  measurements 
and  the  evolution  of  the  trajectory’s  error  through  the  pursuit  process. 

3.  Finding  the  error  of  the  limiting  trajectory. 

5.4  Application  in  Numerical  Computation  of  Optimal 
Control 

The  algorithms  stated  here  can  potentially  lead  to  advances  in  numerical  computing  of 
optimal  trajectories  for  control  systems.  Numerical  methods,  including  the  Newton’s  method 
and  gradient  methods,  are  commonly  applied  optimization  methods.  An  obvious  drawback 
of  ordinary  numerical  methods  is  that  they  need  large  amount  of  calculation  for  they  are 
optimizing  the  result  iteratively. 

For  example,  the  multiple  shooting  method  is  widely  used  in  difficult  applications, 
e.g.  fuel  optimization  problem  for  spaceships  [9].  Proceeding  formally  to  multiple  shoot¬ 
ing  method,  as  Betts  summarized  in  [8],  “the  fundamental  idea  of  multiple  shooting  is  to 
break  the  trajectory  into  shorter  pieces  or  segments” .  The  time  domain  is  broken  into  smaller 
intervals  of  the  form  to  <  ti  <  ■  ■  ■  <  tM  =  tf.  The  initial  value  for  the  dynamic  variable  at 
the  beginning  of  each  segment  is  denoted  as  Uj  for  j  =  0, 1, ... ,  (M  —  1)  and  the  variable 
obtained  through  solving  system  equation  from  tj  to  fj+i  is  denoted  as  i7j.  The  nonlinear 
programming  (NLP)  variables  are  dehned  as  x  =  [uq,  ui, . . . ,  um-i]-  And  the  constraints  for 
NLP  are 

1/1  -  tJq 
Uo  —  Pi 

c(x)  =  .  =0 

(t’iyM,  tf) 

where  (pii'M,  tf)  =  0  is  the  boundary  condition.  The  number  of  NLP  variables  and  constrains 
is  n  =  n^M  where  Ui,  is  the  dimension  of  dynamic  variable  u  and  M  is  the  number  of 
segments  [8]  [13].  Thereafter  the  problem  to  minimize  cost  function  F{x)  can  been  solved 
by  introducing  the  Lagrangian 


L{x,  A)  =  F{x)  —  X^c{x) 
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Necessary  conditions  for  the  variable  [x*,  A*]  to  be  an  optimnm  are  defined  by 

WxL{x,\)  =  0 

S/xL^x.X)  =  0 

When  proceeding  with  the  iterating  process,  the  dimension  of  Jacobian  matrix  is  n  x  n  = 
riyM  X  riyM.  Here  we  have  seen  that  the  nnmber  of  segments  involved  in  the  calculation 
affects  the  “degree  of  lab  or- consumption”  at  least  in  the  order  of  0{'n?).  Moreover,  increasing 
the  variable  size  will  lead  to  more  iterating  steps.  If  fewer  segments  were  introduced  during 
calculation  processes,  the  complexity  of  computing  can  be  decreased. 

Another  obvious  example  of  “more  time  segments  lead  to  increased  complexity”  is  the 
dynamic  programming.  Bellman  introduced  the  Hamilton- Jacobi-Bellman  (HJB)  equation 
to  describe  the  optimal  control  u*{x,t)  as  well  as  the  cost-to-go  function  J*{x,t)  for  all 
possible  initial  conditions  [17,  14].  The  HJB  theory  plays  an  important  role  in  the  held  of 
optimal  control  because  it  provides  sufficient  condition  for  optimality  as  opposed  to  the  nec¬ 
essary  condition  obtained  from  ordinary  optimization  methods  [8].  However,  the  drawback 
of  dynamic  programming  is  the  “curse  of  dimensionality”,  as  Bellman  himself  calls  it.  Even 
dealing  with  a  moderately  complicated  problem  will  involve  an  enormous  amount  of  storage 
[15].  This  drawback  of  dynamic  programming  could  be  seen  from  the  discrete  example  of 
shortest  path  problem  [16]  [17],  as  illustrated  by  the  trellis  diagram  in  Fig.  5.2.  The  worst 
case  will  involve  investigating  paths  and  storing  n^M  data  if  proceeding  backward  from 
the  end  point  to  the  starting  point,  where  is  the  dimension  of  state  x  and  M  is  the  number 
of  segments.  Operation  using  dynamic  programming  with  large  M  is  often  unfeasible  due 
to  the  agent’s  limited  physical  memory. 


Figure  5.2:  The  trellis  diagram  of  shortest  path  problem. 
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We  would  like  to  utilize  numerical  methods  with  less  computing  complexity.  One  idea  is 
to  decrease  the  number  of  time  segments  in  operational  processes.  Fewer  segments  mean  that 
optimization  processes  can  only  be  executed  in  smaller  regions,  which  coincides  with  the  idea 
of  hnding  optima  within  small  regions,  as  stated  before.  Therefore  we  plan  to  apply  local 
pursuit  in  numerical  methods  and  investigate  the  potential  advantages  that  appear,  such  as 
the  decrease  of  physical  requirements  for  each  agent  and  the  “degree  of  labor-consumption” 
for  the  group.  To  complete  the  argument,  we  should  consider  the  following  steps: 

1.  Applying  local  pursuit  in  numerical  methods  to  solve  some  optimal  control  problems. 
Investigate  a  single  updating  process,  determine  the  requirement  for  an  individual 
agent  to  proceed  the  algorithm,  e.g.  the  size  of  storage,  the  complexity  of  computing. 

2.  Trying  to  hnd  the  appropriate  iterative  times  to  reach  the  satisfying  result,  e.g.  to 
determine  the  k  so  that 

\\xkit)  -  x*{t)\\oc  <  £  (5.12) 

3.  Investigating  the  requirements  and  computing  complexity  in  numerical  method  ordi¬ 
narily  applied  in  the  same  problems. 

4.  Comparing  the  two  kinds  of  numerical  methods. 

5.5  Other  Algorithms  Inspired  by  Biology 

Besides  ants,  other  social  insects,  e.g.  worker  honey  bees,  have  shown  us  a  lot  of  group 
activities  with  amazing  coordinated  behaviors.  The  intrinsic  mechanism  has  been  partly 
revealed  by  some  effective  models  of  such  activities.  We  are  considering  ways  to  “borrow” 
the  rules  that  govern  behaviors  of  insects  and  to  develop  additional  biologically  inspired 
algorithms  for  problems  in  engineering.  Some  potential  topics  are  as  follows: 

•  The  foraging  activities  of  worker  honey  bees  [1]  can  provide  us  with  some  clues  on  solv¬ 
ing  the  resource  allocating  problems,  which  has  numerous  applications  in  engineering, 
economics  and  research  operation,  e.g.  routing  a  group  of  taxis  to  pick  up  and  deliver 
passengers  whose  appearances  are  dynamic  or  random,  arranging  a  limited  number  of 
robots  to  execute  several  manufacturing  processes. 

•  The  work  of  [3]  presented  a  model  of  how  ants  select  ongoing  foraging  zones.  According 
to  this  model,  each  ant  has  the  uniform  distribution  over  all  foraging  zones  at  hrst. 
Assume  the  probability  of  foraging  zone  i  at  time  t  is  Pi{t).  If  at  time  t,  the  ant 
hnds  food  in  zone  i,  then  the  probability  Pi{t  -|-  1)  =  Pi{t)  +  min(P''',  1  —  Pi{t)),  where 
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is  a  constant  indicating  the  relative  importance  of  “learning”.  If  not,  then  the 
probability  Pi{t  +  1)  will  be  decreased.  By  this  mechanism,  both  an  individual  ant  and 
a  colony  of  ants  will  evolve  into  optimal  spatial  distribution  over  foraging  zones  -  getting 
maximum  food  when  the  appearance  of  food  at  each  zone  is  random  and  unknown  to 
the  ants.  In  engineering,  this  method  of  “learning”  is  helpful,  especially  when  there 
exist  unknown  factors.  For  example,  we  may  want  to  use  limited  number  of  controllers 
to  stabilize  multiple  plants.  However,  each  plant  has  the  unknown  distribution  of 
deviating  from  its  equilibrium  position  and  we  want  to  minimize  the  sum  of  deviations. 
It  is  promising  that  we  can  let  the  controllers  learn  the  distribution  of  each  plant  and 
develop  decentralized  rules  for  each  controller. 

In  order  to  successfully  complete  the  proposed  research,  the  following  are  specihc  steps 
to  be  taken: 

1.  Finding  some  engineering  or  economic  problems  with  similar  properties  to  a  social 
insect  activity  and  constructing  an  effective  model  of  insect  activities.  Many  works 
have  discussed  models  of  social  behavior  in  insects.  We  will  stress  those  that  appear 
to  have  the  simplest  rules. 

2.  Abstracting  the  rules  that  govern  communication  and  motion  behaviors  of  insects  and 
embedding  it  into  the  artihcial  collectives  in  order  to  solve  the  proposed  tasks. 

3.  Showing  the  effectiveness  of  proposed  algorithm  by  analysis  or  simulation. 
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Appendix  A 


Proofs 

A.l  Preliminaries 

The  following  facts  can  be  derived  easily  from  the  properties  of  optimal  trajectories  and  are 
helpfnl  in  fntnre  argnment. 

Facts  :  Let  r],C,rjF  as  defined  in  Eq.  (3. 4), (3. 5), (3. 8),  Xk{t)  be  a  trajectory  of  Eq.  (3.1) 
and  x*{t)  an  optimal  trajectory  of  Eq.  (3.3)  or  Eq.  (3.7).  Then,  the  following  properties 
hold: 

1.  r]{a,b,T,to,a)  <  C{xk,to,a)  with  any  Xkitfi)  =  x* {tfi) ,  Xkito  +  a)  =  +  o')  where 

x*{t)  satisfies  Eq.  (3.3). 

2.  T]{a,  c,  T,  to,  T)  <  T]{a,  b,  a,  to,  a)  +  p(b,  c,T  -  a,to  +  o-,T  -  a) 

3.  C{xk,  to,  T)  =  C{xk,  to,  a)  +  C{xk,  to  +  a,T  -a) 

4.  T]F{a,b,to,a)  <r]{a,b,T,to,a) 

5.  ri{a,b,T,to,  a)  =  C{x* ,to,  a)  where  x*{t)  satisfies  Eq.  (3.3). 

A. 2  Proof  of  Lemma  4.1 

It  is  enongh  to  show  the  cost  of  the  iterated  trajectories  is  non-increasing  with  k.  Consider 
the  pnrsning  process  between  the  {k  —  1)*^  and  k^^  agents.  As  shown  in  Fig.  A.l,  the 
dotted  line,  denoted  by  Xk-iifi)  on  \tk-i)lk-i  +  T],  indicates  the  leader’s  path.  The  solid 
lines,  denoted  by  Xk(t),  are  the  trajectories  of  the  “follower”,  and  the  dashed  lines,  noted 
by  Xk(t),  are  the  planned  trajectories,  as  described  before.  And  we  use  x{t)  to  denote  the 
trajectory  that  the  follower  copies  from  the  leader’s  trajectory  but  with  a  delay  of  time  A, 
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i.e.  Xk{t  +  A)  =  Xk-iit).  Therefore  the  cost  along  it  must  be  same  to  the  cost  along  the 
leader’s. 


Xk-1  (tk+25) 


Figure  A.l:  Sketch  of  Sampled  Local  Pursuit 

The  follower  leaves  the  starting  state  at  time  tk,  while  the  leader  leaves  it  at  time  tk-i, 
where  tk  =  tk-i  +  A.  For  t  G  [tk,tk  +  <5],  the  follower  moves  on  an  optimal  trajectory  from 
state  Xkitk)  to  Xk-i(tk)  over  A  units  of  time.  Thus  from  Fact  1: 

r]{xk{tk),Xk-i{tk),  A,tk,  A)  <  C{xk,tk,A) 

=  C{xk-i,tk-i,A)  (A.l) 

The  right-hand  side  is  the  cost  along  the  leader’s  path  for  the  hrst  A  units  of  time,  the 
left-hand  side  is  the  optimal  cost  from  Xkitk)  to  Xk-iitk). 

At  time  tk  +  5  the  follower  reaches  the  state  Xk{tk  +  S).  Recalling  that  the  trajectory 
drvien  by  u*^{t)  is  optimal  from  Xkitk)  to  Xk-iitk)  and  from  Fact  3,  we  can  divide  the  cost 
into  two  parts,  one  is  actual  and  the  other  is  planned^,  i.e. 

ri{xk{tk),Xk-i{tk),  A,  tk,  A) 

=  r]{xkitk),Xk-i(tk),  A,  tk,  S)  +  r]{xk{tk  5),Xk-i{tk),  A  -  5,tk  +  6,  A  -  5)  (A. 2) 
From  (A.l), (A. 2): 

7]{xk{tk),Xk-i{tk),A,tk,6) 

<  C{xk-i,tk-i,A)-ri{xk{tk  +  5),Xk-i{tk),A-5,tk  +  5,A-5)  (A. 3) 

At  time  tk  +  S,  the  follower  updates  its  trajectory  to  catch  the  leader  at  its  new  location 
Xkitk  +  5).  For  this  trajectory  is  optimal  from  Xkitk  +  5)  to  Xk-iitk  +  <5)  over  time  A,  any 

^These  two  pieces  are  both  optimal  with  respect  to  their  corresponding  end  points. 
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path  Xkit)  {t  G  [tk  +  S,tk  +  S  +  A])  that  is  from  Xk{tk  +  5)  to  Xk-iitk  +  5)  over  time  A  and 
passes  through  Xk-iitk)  at  time  tk  +  A  =  tk  +  5  +  A  —  5  has  equal  or  more  cost.  From  Fact 
2  follows: 

r]{xk{tk  +  5),Xk-i{tk  +  (5),  A,  4  +  (5,  A) 

<  r]{xk{tk  +  5),Xk-i{tk),  A  -  (5, 4  +  <5,  A  -  (5)  +  7]{xk-i{tk),Xk-i{tk  +  <5),  5, 4  +  A,  5) 

<  vi^kitk  +  6),Xk-i{tk),  A  -  6,tk  +  S,A  -  6)  +  C{xk,  tk  +  A,  5) 

=  r]{xkitk  +  S),Xk-i(tk),A  -  S,tk  +  S,A  -  S)  +  C{xk-i,tk,S)  (A.4) 

We  can  also  divide  this  cost  into  a  realized  part  and  a  planned  one,  i.e. 
vi^kitk  +  6),Xk-i(tk  +  (5),  A,  4  +  6,  A) 

=  vi^kitk  +  6),Xk-i(tk  +  S),A,tk  +  6,6)  +r]{xk(tk  +  26),Xk-i(tk  +  6),  A-  6,tk  +  26,  A-  5) 

(A.5) 

From  (A.l)  ~  (A.5),  we  obtain 

C {Xk,  tk,  26) 

=  r]{xk(tk),Xk-iitk),A,tk,6)  +  r]{xk{tk  +  6),Xk-i{tk  +  5),  A,  4  +  6,6) 

<  C{xk-i,  4-1,  A)  +  C{xk-i,  tk,  5)  -  vi^kitk  +  26),Xk-i{tk  +  6),  A  -  6,tk  +  26,  A  -  6) 

=  C{xk-i,tk-i,A  +  6)-C{xk,tk  +  26,A-6)  (A. 6) 

where  rj^Xkitk  +  26),Xk-i(tk  +  6),  A  —  6,tk  +  26,  A  —  6)  =  C{xk,  tk  +  25,  A  —  <5)  is  from  the 
fact  that  the  planned  trajectory  is  optimal. 


Figure  A. 2:  First  two  steps  in  sampled  local  pursuit 

We  repeat  this  procedure  until  t  =  tk  +  n6  where  A  +  (n  —  1)5  <  T  and  A  +  n5  >  T. 
This  choice  of  n  means  that  the  leader  has  not  reached  the  hnal  state,  and 

n—1 

C{xk,tk,n6)  =  '^r]{xk{tk  +  i6),Xk-i{tk  +  i6),A,tk  +  i6,6) 

i=0 

<  C{xk-i,tk-i,A+  {n-l)6)  -C{xk,tk  +  n6,A-6)  (A. 7) 
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When  t  G  [th  +  n6,  tk  +  T],  the  leader  reaches  the  final  state  and  stays  static.  Dnring  this 
time  period,  no  matter  how  many  times  the  follower  npdates  its  movement,  it  will  move  on 
the  same  path  that  was  determined  at  time  t  =  tk  +  n6.  This  path,  which  is  indicated  by  the 
last  solid  line  in  Fig.  A.l,  is  locally  optimal  between  the  states  Xk{tk  +  n6)  and  Xkitk  +  T) 
over  T  —  n6  nnits  of  time.  Therefore 

C{xk,  tk  +  T  —  n6) 

=  v{xk{tk  +  n6),Xk-i{tk-i  +  T),T  -n6,tk  +  n6,  T  -  n6) 

<  C{xk,tk  +  n6,A  -  6)  +  C{xk-i,tk  +  {n  -  1)6,T  —  {n  -  1)S  -  A)  (A. 8) 

From  (A. 7)  ~  (A. 8),  we  obtain 

C{xk,tk,T)  <  C{xk-i,tk-i,A  +  {n-l)6)+C{xk-i,tk  +  {n-l)6,T-{n-l)6-A) 

=  C{xk-i,tk-i,T)  (A. 9) 

We  have  shown  that  cost  incnrred  by  the  follower  is  no  greater  than  the  leader’s.  Writing 
Ck  =  C{xk,tk,T)  in  convenience,  we  can  see  that  Ck  <  Ck-i-  Obvionsly  Ck  is  bonnded 
below  if  there  exits  an  optimal  trajectory  from  the  starting  state  to  the  target  state.  Hence 
we  conclnde  that 

lim  Ck  =  C  (A.  10) 
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A. 3  Proof  of  Lemma  4.2 

Snppose  there  exist  more  than  one  limiting  trajectory,  and  snppose  Xi{t)  and  X2(t)  are  two 
possibilities.  xi(t)  differs  from  X2(t)  for  t  G  [^1,^2]  U  [^3,^4] . . ..  From  Lemma  4.1  these  two 
trajectories  mnst  have  the  same  cost. 

Let  the  leader  Xk-iif)  travel  along  Xi(f),  while  the  follower  Xkit)  travels  along  0:2 (t).  If 
no  update  occnrs  dnring  [^1,^2],  X2{t)  has  less  cost  dnring  [^1,^2]  becanse  the  follower  moves 
along  X2{t)  and  the  local  optimnm  is  nniqne.  Same  argnments  on  other  different  time  periods 
lead  to  the  face  that  the  whole  cost  along  X2{t)  is  less  than  Xiit)  if  no  npdate  occnrs  dnring 
t  G  [^1,^2]  U  [ts,  . . .,  which  contradicts  to  the  fact  that  two  trajectories  have  the  same  cost. 

Next,  assnme  only  one  npdate  occnrs  dnring  [^1,^2],  as  Fig.  A. 3  indicates.  Separate  the 
cnrves  dnring  [^1,^2]  into  several  segments  (the  meaning  of  different  cnrve  style  is  the  same 
as  in  Lemma  4.1),  and  indicate  the  cost  along  curve  i  as  C*.  From  the  uniqueness  of  local 
optimum,  we  have  Ci  +  C5  <  C3  and  C2  <  C5  +  C4.  Hence  Ci  +  C2  <  C3  +  C4,  which  means 
X2{t)  has  less  cost  than  Xiit)  during  [^1,^2]- 

If  there  are  multiple  updates  during  [^1,^2],  we  can  see  that  the  updates  does  not  change 
the  fact  that  cost  along  X2{t)  is  less  than  xi(t).  Hence  we  still  get  the  result  that  the  cost 
along  X2it)  is  less  than  Xi{t)  for  t  G  [ti,t2],  no  matter  how  many  updates  occur. 
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4  t2 


Figure  A. 3:  There  is  one  update  between  two  trajectories 

Iterating  on  more  different  time  periods  leads  to  the  fact  that  the  whole  cost  along  X2{t) 
must  be  less  than  Xi{t).  We  also  obtain  contradiction. 


A. 4  Proof  of  Lemma  4.3 


Let  the  leader  move  along  the  limiting  trajectory  Xoo{t),  suppose  it  is  the  {k  —  1)*^  agent. 
From  Lemma  4.2,  the  limiting  trajectory  means  that  Xk-iif)  =  Xk(t  +  A)  for  Vf  G  [tk,tk  +  T]. 

At  first  we  claim  that  in  the  time  interval  [tk  +  6,tk  +  A],  the  planned  trajectory  agrees 
with  the  realized  one,  i.e.  Xk(t)  =  Xk(t),t  G  [tk  +  6,tk  +  A].  Suppose  that  Xk(t)  ^  Xk(t)  for 
some  f  G  [tk  +  S,tk  +  A].  Because  x(t)  is  optimal  from  Xkitk  +  5)  to  Xkitk  +  5  +  A),  the 
trajectory 


x{t) 


Xk{t)  t  e[tk  + 6,tk  + A) 

Xk(t)  t  E  [ffc  +  A,  tfc  +  5  +  A] 


has  less  cost  than  the  trajectory  Xkit)  {t  G  [tk  +  S,tk  +  S  +  A])  ,  which  is  updated  by  the 
follower  at  the  time  t  =  tk  +  S  and  is  supposed  to  be  optimal  from  Xkitk  +  S)  to  Xkitk  +  S  +  A). 
Thus  there  is  a  contradiction.  Hence  we  obtain  Xk(t)  =  Xkit)  for  'it  G  [tk  +  <5,  +  A].  Same 

arguments  could  be  applied  in  other  time  periods. 

x{t)  is  smooth  for  t  G  [tk^tk  +  A]  because  the  locally  optimal  trajectory  is  smooth, 
and  Xkit)  is  smooth  for  t  E  [tk  +  6,tk  +  S  +  A] (second  update  step)  because  of  the  same 
reason.  And  we  know  Xk(t)  =  Xk(t)  for  it  E  [tk  +  6,tk  +  A].  Thus  the  actual  trajectory 
Xk{t){t  G  \tk,tk  +  2(5])  is  smooth.  Continuing  on  this  argument  leads  to  the  result  that  the 
whole  trajectory  Xkit)  {t  E  [tk,tk  +  Tj)  is  smooth. 
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A. 5  Proof  of  Lemma  4.4 


We  rewrite  the  lemma  to  that  if  x*{t)  {t  G  [0,  ti  +  Ai])  and  x*{t)  {t  G  [ti,  T])  are  two  locally 
optimal  trajectories  and  Condition  4.1  is  satished,  where  0  <  <  ti  +  Ai  <  T,  then  the 

trajectory  x*{t),t  G  [0,T]  is  a  local  minimum. 

We  take  0  <  A  <  Ai.  From  principle  of  optimality,  we  obtain  that  x*{t)(t  G  [0,  ti  +  A]) 
and  x*{t){t  G  [ti,T])  are  two  locally  optimal  trajectories  with  respect  to  their  corresponding 
end  points. 

Suppose  that  x*{t){t  G  [0,T])  is  not  the  local  minimum,  there  must  exist  an  e  <  e  and 
another  optimum  x(t)  G  B  x  [0,T]  satisfying  that  \\x(t)  —  x*(t)||oo  <  e  and  C{x(t),0,T)  < 
C{x*{t),0,T),  as  Fig.  A. 4  shows. 


Figure  A. 4:  Overlapped  local  minimums  lead  to  the  local  minimum  overall 

Construct  two  optimal  trajectories  yi(t),y2it),t  G  +  A]  connecting  x{t)  and  x*{t) 
such  that  x*{ti)  =  y2iti),x*{ti  +  A)  =  yi{ti  +  A),x{ti)  =  yi{ti),x{ti  +  A)  =  y2iti  +  A). 
From  principle  of  optimality,  x*{t)  and  x{t)  {t  G  [ti,ti  +  A])  are  both  optimal  trajectories 
with  respect  to  their  corresponding  end  points.  Now  with  the  condition  of  Eq.  (4.1),  we 
obtain 


C{yi(t),ti,A)  <  C{x{t),ti,A)  +  CA 

C{y2{t),h,A)  <  C{x*{t),h,A)+CA  (A.ll) 

For  x*{t)  {t  G  [0,fi  +  A])  and  x*{t)  {t  G  [ti,T])  are  two  unique  local  optimal  trajectories,  we 
have 


C{x*{t),0,ti)  +  C{x*{t),ti,A)  <  C{x{t),0,ti)  +  C{yi(t),ti,A) 

C{x*{t),ti,  A)  +  C{x*{t),ti  +  A,  T  -  -  A)  <  C{x{t),ti  +  A,  T  -  fi  -  A)  +  C{y2{t),ti,  A) 

(A.12) 

Combining  (A.ll)  and  (A.12)  leads  to 

C{x*{t),  0,  T)  +  C{x*{t),ti,  A)  <  C{x{t),  0,  T)  +  C{x*{t),ti,  A)  +  2CA 
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which  could  be  derived  as 


C{x*{t),  0,  T)  <  C{x{t),  0,  T)  +  2/:A 


(A.13) 


C(a;(t),0,T)  is  assumed  to  be  less  than  0,  T),  but  if  we  take 

c(x-(i),o,r)-c(x(t),o,r) 

2C 

Therefore  Eq.  (A.13)  can  not  be  true.  There  is  a  contradiction  because  A  could  be  set  to 
be  arbitrarily  small.  Hence  follows  the  conclusion  that  x*{t)  {t  e  [0,T])  must  be  the  local 
minimum. 


A. 6  Proof  of  Lemma  4.5 

Suppose  that  +  A  +  A  <  T.  As  Fig.  A. 5  indicated,  the  follower  moves  on  the  locally 
optimal  trajectory  Xk{t){t  G  +  A,ffc  +  A  +  A])  at  time  tk  +  A.  Define  a  function  G  : 

D  X  [0,T]  X  — >•  B  X  [0,T]  to  represent  the  new  trajectory,  denoted  as  G{X,Xk-i(t)).  The 

cost  along  the  follower’s  trajectory  is 

G{xk,tk,T)  =  G{xk,tk,  A)  +T]{xk{tk  +  X),xk-i{tk  +  A),  A,  4  +  A,  A)  +  G{xk,tk  +  A,T  -  A) 
<  G{xk-i,  tfc-i.  A)  +  G{xk-i,  tk-i  +  A,  A)  +  G{xk-\,  tk-i  +  A  +  A,T  —  A  —  A) 

=  C(xfc_i,4_i,r)  (A.14) 

Same  argument  could  be  applied  for  the  case  where  +  A  +  A  >  T. 


Figure  A. 5:  The  cost  is  decreased  with  a  single  update. 


A. 7  Proof  of  Lemma  4.6 

Suppose  the  cost  along  the  leader’s  trajectory  Xk-iit)  (t  G  [tk-i,tk-i  +T])  is  Gk-i-  Set  up  a 


48 


Figure  A. 6:  Trajectory  Sequence  of  xl.{t). 

trajectory  sequence  xl(t)  (t  G  [tk,  tk  +  T]),  i  =  1,  2  . . .  with  the  corresponding  cost  of  Let 
x\{t)  =  Xk-iit)  and  x\{t)  =  G{{i  —  1)5, as  Fig.  A. 6  indicates,  where  G  is  defined  in 
the  proof  of  Lemma  4.5. 

According  to  Lemma  4.5, 

Cl  <  Gl^  ^  Gr  <  Gl  =  Gk-i 

with  5  >  0. 

Let  6  =  T/i,  then  5  — »■  0  as  i  — *■  cxd.  And  now  the  trajectory  xl{t)  is  exactly  under  the 
same  updating  process  as  in  the  continuous  local  pursuit.  Therefore  we  obtain  the  follower’s 
cost  Gk  =  G^  <  Gk-i-  Since  the  sequence  {Gk}  is  non-increasing,  surely  it  will  converge  to 
a  limit. 
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